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METHOD AND APPARATUS FOR EFFECTIVELY 
PERFORMING LINEAR TRANSFORMATIONS 

TECHNICAL FIELD 

The present invention relates to the fields of digital signal processing. It introduces 
an improved method and apparatus for performing linear transformations, with reduced 
number of arithmetic operations, simplified circuitry, and low power consumption. 

BACKGROUND 

Linear transformations are commonly used for signal or data processing. A linear 
transformation produce from an n-dimensional vector, considered as "input vector" an r- 
dimensional vector, considered as "output vector". In many applications the input vector 
consists of digital samples of a given signal that requires processing. However applications 
of linear transformations are not restricted to any specific domain, they are essential to every 
field of science and technology and their efficient performance is highly desirable. 

A general linear transformation from the set of n-dimensional vectors (real, complex 
or from any field of scalars) to the set of r-dimensional vectors (over the same field of 
scalars), may be represented by a matrix of order rxn given by: 



A= 



*12 



a 2\ a 22 



*1« 



*2« 



A general input n-dimensional column vector is written in the following form: 



x = 



x n 



-1- 



Attorney Docket No. 10559-346001 / P8300 



The linear transformation produce from the input vector x an output column vector, y 
of dimension r, through the product of the matrix by the vector which is given by following 
formula: 



yi 
y 2 



y r 



a 



12 



a 2l a 22 



*2« 



CL^X^ + ^i2"^2 ^~ ~^^\n^n 

a 2l x x + a 22 x 2 + +a 2n x„ 



l a A x l a r2 x 2 + + a m x *.. 

A straightforward performance of the linear transformation requires, r-n products and 
r-(«-l) additions. 

Binary matrix are of special importance in the development and application of the 
present invention. Binary-bipolar matrix are those containing only ±1 values in their entries. 
Henceforth they would be called {/-matrix. A linear transformation represented by a U- 
matrix will be called {/-transformation. In the above presentation, if the given transformation 
is a {/-transformation then r-(n-l) additions/subtractions would be required for its 
straightforward performance. Another type of binary matrix are those containing only 0,1 
values in their entries. They would be called 0-1 -matrix. A linear transformation represented 
by 0-1-matrix is called a 0-1 -transformation. In the above presentation, r<w-l)/2 additions 
would be required, on average, for a straightforward performance of a 0-1 -transformation. 

In the current text the term "binary transformation" would consist of the two types of 
transformations mentioned above: [/-transformations and 0-1 -transformations. To complete 
the terminology-introduction it would be mentioned that the term C/-vector would refer to a 
vector having ±1 values in its components and likewise, a vector having 0,1 values in its 
components will be called a 0-1-vector. 
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The processing of binary transformations consist of adding and subtracting components 
of the input vector. They are implemented in hardware as an electronic Application Specific 
Integrated Circuit (ASIC), requiring elements such as adders and subtracters. The usage of 
these components is costly and energy consuming and their construction occupies a precious 
area. A hardware implementation of binary transformations with a low consumption of these 
resources is of increasing need in numerous fields of technology. 

There is a widespread usage of [/-transformations in various aspects of digital signal 
processing. It includes a variety of communication technologies such as image processing 
and wireless communication. 

Currently there is a growing worldwide interest in direct-sequence (DS) code division 
multiple access (CDMA) spread spectrum communication systems. The IS-95 [TIA/EIA/IS- 
95-A, "Mobile Station Base Station Compatibility Standard for Dual-Mode Wideband Spread 
spectrum Cellular System", Feb 27, 1996] is one example of the developing DS-CDMA 
systems. 

In the CDMA transmission technique a multi-user data is send in a composite signal 
and then multiplied before transmission by a Pseudo-random Noise (PN) code, which is a U- 
sequence having random noise properties (such as low cross-correlation). The spreading 
properties makes transmission resistance to noise, including multi-path noise, jamming or 
detection. A scrambling code, longer than the channeling code, is also applied. This 
transmission fashion of the second generation (IS-95-B) and third generation (3G) wide band 
(WB) CDMA standards require the receiver to perform multi-code detection. This is a task 
of despreading the combined channels simultaneously, each of which was previously spread 
according to a different channeling code. It is done by application of [/-transformations 
where the spreading codes comprising the transformation matrix are often rows of a 
Hadamard matrix. This is one of the many examples where effective [/-transformation 
techniques are desired for accomplishing the computational tasks with a low consumption of 
resources. 

There are several known mechanisms that improve the performance of specific types 
of linear transformations. A few relevant examples chosen from a very wide range will be 
mentioned here. Toplitz transformations which perform convolutions and are used as filters 
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in DSP can be done efficiently by the use of the classical Fast Fourier Transform (FFT) 
algorithm requiring 0(n*Iog 2 (n)) addition and multiplication operations, where n is the 
dimension of the domain space. In comparison the standard, straightforward methods, 
require 0(n 2 ) addition and multiplication. For details see: [James W. Cooley and John W. 
Tukey, "An algorithm for the machine calculation of complex Fourier series", Mathematics 
of computation, 19 (90): 297-301, April 1965.] 

A specific type of [/-transformation used in digital signal processing including 
CDMA technologies is the Walsh-Hadamard Transformation which is represented by a 
Hadamard matrix. A Hadamard matrix can be defined recursively as follows: 

H!=[l], H 2 = i _ x , 

and for every integer n, a power of 2: H 2/I = 

A low complexity and energy preserving algorithm for performing this transformation 
is the Fast Walsh-Hadamard Transform (FWT), which reduces the number of 
additions/subtractions for nxn Hadamard matrix to n*log 2 (n). This provides an optimal 
saving in these operations. The fundamental ideas underlining this method are similar to 
those of the FFT 

The [/-transformation's aspect is modified by another feature of the invention to 
accommodate for an efficient method and apparatus for the performance of and toplitz U- 
transformations. These transformations represents full or partial convolution with a given U- 
sequence or a complex [/-sequence. Wireless communication applications of toplitz U- 
transformations include initial time-synchronization and the searcher which use convolutions 
with a real or complex pseudo-random (PN) or gold, [/-sequences. A gold sequence is 
composed by the Z 2 sum of two PN sequences. 

The above binary aspects of the invention are generalized by another aspect of the 
invention to an efficient method and apparatus for the performance of a linear transformation, 
represented by an rxn matrix with a relatively small number of different entries. 
Applications of this advanced aspect of the invention include complex [/-transformations and 
linear transformations represented by an matrix with {0,1,-1} -entries. This broad preferred 
embodiment of the invention would be called the generalized-elimination-method (GEM). 



H„ H „ 

n n 

H n -H tt _ 
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Wide band CDMA and probably other branches of advanced DSP future technologies 
would benefit from the simultaneous analysis of an incoming signal in different spreading 
factors. This would give way to a multi-code channel that allows the simultaneous reception 
of different type of information such as fax, normal phone conversation and internet data 
without overburdening the network. Another aspect of the present invention is an efficient 
method and apparatus for these type of linear operations. It comprises modified versions of 
the [/-binary aspect of the invention with an additional processor. This additional processor 
uses information about the number of additions required by the said [/-binary method to 
discover a configuration that applies the [/-binary method in various subsections in global low 
rate of additions. 

Additional major applications of the invention include wireless multimedia systems 
personal satellite mobile systems, GPS satellite-based locating systems and more. In the 
context of mobile communication and other mobile, computational based, systems a reduction 
in current consumption used for linear processing is essential. Applications of the invention 
the in the mobile phone technologies can prolong the life of the battery, reduce circuitry and 
shorten response time. 

DESCRIPTION OF DRAWINGS 

These and other features and advantages of the invention will become more apparent 
upon reading the following detailed description and upon reference to the accompanying 
drawings. 

Fig. 1 schematically illustrates the operation of updating the contents of a memory 
employed by an apparatus which performs the transformation using the 0-i-binary 
transformation matrix, according to a preferred embodiment of the invention; 

Fig. 2 schematically illustrates the operation of updating the contents of a memory 
employed by an apparatus which performs the transformation using the U- transformation 
matrix, according to a preferred embodiment of the invention; 

Fig. 3 schematically illustrates the operation of updating the contents of a memory 
employed by an apparatus which performs the transformation using the Toplitz 
transformation matrix, according to a preferred embodiment of the invention; and 



Attorney Docket No. 10559-346001 /P8300 



Fig. 4 is a block diagram of an exemplary apparatus for performing a linear 
transformation with a reduced number of additions, according to a preferred embodiment of 
the invention. 

Fig. 5 illustrates an example an implementation of a product of an rxn U-matrix A by 
an n-dimensional vector X according to one embodiment of the present invention. 

DETAILED DESCRIPTION 

There are two new terms that are fundamental for the description of the method of the 
present invention. The first is that of equivalent vectors. Two non-zero vectors of the same 
dimension are called equivalent in the text that follows if each one of them is a product of the 
other. This term may be applied to two rows or two columns of a given matrix. The second 
term is that of a lead element of a column in a given matrix. The definition used throughout 
the present invention is the non-zero component of highest index in each column of the said 
matrix. This definition depends on a predefined columns index-order which is used 
uniformly in the given matrix. A redefinition of new order results with a redefinition of new 
lead elements. For example the lead elements of a given matrix may be the bottommost non- 
zero elements or the uppermost non-zero elements of each column depending on whichever 
order it seemed convenient to choose as relevant for that purpose. 

0-1-MATRIX 

The present preferred embodiment of the invention provides an efficient method of 
performing a 0-1 -binary linear transformation. Henceforth it will be called also the "0-1- 
method". The following example is an introduction to the 0-1 -method and an outline of its 
main ideas. 

Example: Given a 0-1 -binary 4x14 matrix^: 

"0 1 1 0 1 1 0 0 1 0 0 0 if 
A _ 100101100011 10 
~001 1 11000010 11' 

111101101011 r i_ 

and an 14-dimensional input vector, 
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X = 



v 14. 



10 



Suppose that it is desired to compute the 4-dimensional "output" vector: 







"0 


i 


i 


0 


1 


1 


0 


0 


1 


0 


0 


0 


1 


i 


y 2 




i 


0 


0 


1 


0 


1 


1 


0 


0 


0 


1 


1 


1 


0 


y* 




0 


0 


1 


1 


1 


1 


0 


0 


0 


0 


1 


0 


1 


1 






i 


i 


1 


1 


0 


1 


1 


0 


1 


0 


1 


1 


r 


1 



v 14. 



= A*x. 



According to a preferred embodiment of the invention the matrix A is first checked 
for equal lines. Since there are no two lines that are equal, the procedure goes to the next 
step. Next, the output vector y is expressed as a sum of the columns with coefficients that are 
the respective input vector coordinates. Hence: 
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At this stage, the zero columns, along with their coefficients would be deleted. The 
first step done by a preferred embodiment of the invention with respect to the input vector, is 
to collect and sum coefficients of any recurring non-zero column. Thus with the cost of 6 
additions the following reduced vector equation is accomplished: 



-7- 
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This expression may be simplified by the following definition: 

Wi=Xj + Xt+ X] 2 , W 2 =X 2 +X 9 , W 3 = X 3 + X }4 , W 4 = X4+ Xj] , W 5 =X 5 , W 6 = X 6 +X] 3 , 

that implies: 
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.y* J 
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This is equivalent to original problem but with a reduced number of columns. 
However by now this type of reductions is exhausted. To get further gains the vector 
equation will be split to the following equivalent set of two vector equations: 
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Each of these two equations will be treated separately as in the first step of this 
preferred embodiment. Thus coefficients of the same nonzero column will be collected and 
summed. Hence the method arrives at this stage the following set of two vector equations 
with additional cost of 6 additions: 

yi 

72. 

'y 3 

3V 





"0" 


+ (\V2+ W 3 + Ws) 


Y 
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(W}+ w 4 ) 
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+ w 6 
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■(W1+W2) 
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+ W5 
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Now it is apparent that only 4 additional additions are required to complete the 
computation of the output vector y. Hence the desired result was accomplished with the total 
of 16 additions. It can be readily checked that the conventional prior art method of doing this 
computational process would require 28 additions. The proceeding description of this 
preferred embodiment of the invention would reveal a growing relative efficiency of the 
method as the dimensions of the matrix grows. 

In general the 0-1 -binary aspect of the invention is an efficient method and apparatus 
for the product of a binary 0-1 -matrix A of dimensions rxn by an ^-dimensional vector x. 
The matrix is given by: 



a, 



a. 



*\n 



a 



In 



where ay= 0 J for all 1 <i < r , 1 <j < 



and the input vector is given by: 



x 



The entries of the input vector may be real or complex or belong to any field of scalars (such 
as Z 2 for example). The goal is to compute the result of the product of the matrix A by the 
vector x. This result will be denoted by: 



*1« 



a 



r\ 



X 
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The procedure that will be described is recursive and the following sequence of steps 
or part of them will be repeated at each iteration. According to a preferred embodiment of 
the invention, the first step is a check for equal raws. When possible this step should be done 
before any processing of input vectors begins, as a preliminary preparation. If it is found that 
the /-line is equal to the 7-line then it can be deducted that yi is equal to yj. Therefore one of 

the two equal lines would be omitted. For the sake of clarity the line having larger index 
would always be the omitted one. Thus, if j > i then the 7-line would be omitted. This 
initial operation is amended at the final step of the execution of the invention where yj which 

is known at this stage (since it is equal to yj) is inserted back to its appropriate place in the 
vector y. Omitting of equal lines continues until there are no two line in the matrix A that are 
equal. In practice this stage is skipped in many occasions. It is performed whenever there is 
a reasonable likelihood for equal lines. It is always performed when log 2 (r) > n since this 

condition ensures the existence of equal lines. 

To avoid cumbersome notation, the modified matrix resulting from this screening 
process will have the same name, A, as the original matrix, and the names of the dimensions 
rxn will also be preserved. In the next stage a summation process is introduced to eliminate 
equal columns. This procedure will reduce the number of columns in the modified matrix 
with a minimal cost in additions. First, the matrix by vector product y=A*x is decomposed 
to the sum of the ^-columns multiplied by the respective x-components. Hence: 





a n 




a a 








a 22 


y=A*x = X] 




+ X2 






_a rX _ 







Each element of this sum consists of a column vector of the matrix A 9 multiplied by 
the corresponding scalar coefficient of the vector x. This sum may be written more 
compactly as: y=A*x = xj vj+ X2 v 2 + + x n v n 



-10- 
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a: 



m 10 



15 



20 



where vj (for every j) is they column of A: vj= 



a 2J 



According to a preferred embodiment of the invention, zero columns, along with their 
coefficients are omitted at this stage. Next, identical nonzero columns are grouped together 
and rearranged where each distinct column becomes a common factor, multiplied by the sum 
of its corresponding scalar coefficients. Hence, the resulting summation comprises the 

subsequence of all distinct nonzero columns wj 9 9 w m extracted from the original 

sequence of nonzero column by the omission of repeating columns. Each column wj is 
multiplied by a coefficient tj which is a sum of the corresponding original coefficients of all 
columns that are equal to this column. It is clear that m < n 9 and the gap, n-m 9 is the number 

of repetitions in the original sequence v/, v^, , v B . The process described above can be 

formulated by the following mathematical description: 

y=A*x= ^i<j<n x fj 



^ 1 <k <m^ j such that: vj = w % X j V f 

= <k<mi^ j such that: Vj = w k x j) w k 



Now for all 1 < k< m define: 



t such that: Vj = w £ x j 

The computational task, so far, is this calculation of the new coefficients tj, ,^as sums 

of the original ones xu jCn- The cost is n-m additions. The product y=A 9 x is thus given 

by: 

y=A*x=U } < k±m t k w k 

The management aspect of this computation, of determining the participants in the 
required sums, may be done once-per-matrix as a preliminary preparation before the arrival 
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of input vectors. To summarize, let B be the rxm matrix whose columns are respectively 
9 w m , and let: 



t = 



then y=A 9 x= B*t . Thus the above process reduced the original problem of 
multiplying the rxn matrix A by the n-vector x to the problem of multiplying the rxm matrix 
B by the m-vector t The gain enabled by this is larger when m is smaller. 

The next step is a horizontal split of the matrix B of the former stage to two or more 
sub-matrix, where lines are kept unbroken. Each sub-matrix will have a reduced number of 
rows and that will increase the gain of the above column-collection procedure in the next 

iteration. The lines of the matrix B would be denoted by: u u u 2 , , u r . The product 

under consideration may be then expressed by: 



y=B*t= 



u 2 -t 



u x -t 



where each line is multiplied in scalar product by the vector t. In view of this 
expression the task of computing y=B*t is equivalent to the combined task of computing the r 

scalar products: uj% u 2 H , u r H 

According to a preferred embodiment of the invention this would be done by 
computing the matrix by vector products: B^t and B 2 *t where each of the matrix B } and B2 
contains a subset of the lines of the matrix B and these two subsets are mutually disjoint and 
their union comprise of the set of all lines in B. Usually, except for the first iteration, the two 
sub-matrix would have the same or almost the same number of lines. However depending on 
the properties of the matrix A in some special cases the split might be into more than two 
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matrix. Such is the case when the matrix A has no special inner order, and its entries may be 
considered as arbitrarily chosen. Then a preferred embodiment of the invention would 
usually imply that the first line split that will bring all the split sub matrix to the state where 
log2(number of columns) > number of rows. The proceeding iterations would then split the 
lines to two almost equal halves. The case where the matrix is considered arbitrary is 
essentially a worst case scenario. 

Another step that might be inserted instead, or before, the above mentioned horizontal 
split is a vertical split. According to a preferred embodiment of the invention the above 
mentioned sum: 

is split in two: 

y= ^l<k<p f k w k + ^p+lKkZmttPk 
where p is chosen between 1 and m-1. Thus for B u the matrix whose set of columns 

is, wi, 9 w P9 and for B 2 > the matrix whose set of columns is w p +j 9 9 w m it holds 

that: 

y=Brt' + B 2 't" 

where the corresponding vectors are: t-(tj y 9 t p ) and t 9t =(tp+u ,t m ). Hence 

by this preferred embodiment of the invention the two lower dimensional products, Bj*t f and 
B 2 *t" would be computed separately and in the end the results, which are two r-vectors, are 
summed together. A preliminary rearrangement of the indexes of the 5-columns, 
wu may enhance the effectiveness of this step. 

The vertical split would be used less frequently then the others procedures. Its main 
use is in the case that the number of rows substantially exceeds the number of columns, 
situation less common in DSP applications. The fundamental drawback of this step is in the 
need to sum up the results of the two products B } *t r and B 2 *t" which consists of an additional 
set of r scalar additions which has no parallel in the above horizontal split. Like in the above 
horizontal split, depending on the properties of the matrix A the vertical split might be into 
more than two matrix. 

Finally each of the above steps are applied in repeated iterations, forming thus a 
recursive mechanism. 

-13- 
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Next, a bound is presented to the saving enabled by the above method. For an rxn 0-1- 
matrix A such that r < log 2 (n) the number of additions in the worst case, denoted by s *(n , r), 
is given by the expression: 

s*(n , r) = n+ s*(r). 

The following table gives the bound to the number of additions for matrix with a small 
number of rows. 

s*(2)=-l s*(n,2)=n -1 

s*(3) = 2 s*(n,3)^n + 2 

s*(4) =7 s*(n,4)=n + 13 

s*(5)= 22 s*(n ) 5)=n +49 

There is the following bound to the worst case : 

s*(r) < 2 r + 2 r/2 + 2 - r. 
The standard (prior art) method requires u(A) - r additions, where u(A) is the number 
l'ns in the matrix A. On average (by expectation, when A is arbitrary) u(A) =(n-2)*r/2. Since 
s*(n , r)/n approaches 1 as n goes to infinity and r is constant (or r < c*log2 (n) where 1 > c> 

0) this method is asymptotically (for bounded r and unbounded n) optimal. 

When A is an rxn 0-1 -matrix and there are no assumptions on the dimensions rxn of 
the matrix A then the number of additions required by the invention to compute the product 
A*x is bounded by (1 + U)n*r/log2(n), where 1 > □ > 0, and □ goes to zero when both, r and 

w, go to infinity. In the case that r>n a tighter bound (with respect to this case) exist, which 
results from the application of the vertical split: 

(1 + U)n*r/log 2 (r), where 1 > □ > 0, 

and □ goes to zero when both, r and n, go to infinity. 

To assess the efficiency of this process it should be observed that the present 
invention increases management operations and reduces additions. However additions are a 
major part of the complexity and their reduction by the use of the invention has a dominant 
effect. Moreover, most of the management operations are done once per matrix before the 
processing of data vectors begins. Thus the suggested method substantially saves power 
consumption and circuitry. The above 0-1 -binary aspect may be particularly efficient when 
the given matrix has some degree of sparsity. This is often encountered when the application 
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of this embodiment is combined with distributed arithmetic's in the computation of the 
product of a general real matrix by a vector as described in the real-matrix aspect of the 
invention. 

In addition this aspect of the invention is applicable in algebraic codes (see: [Shu Lin, 
Daniel J. Costello, Jr "Error Control Coding Fundamentals and Applications" Prentice-Hall, 
Inc., Englewood Cliffs, N.J., 1983]) where in the process of coding and decoding a Z 2 -matrix 
is multiplied by a Z 2 -vector (or Z 2 n-vector). This is true in any situation where a Z 2 -matrix is 
multiplied by a Z 2 n-vector. Finally, the above 0-1 -binary embodiment and the next [/-binary 
embodiment are easily interchangeable in every field of scalars except for fields of 
characteristic 2. In each particular case the more efficient embodiment would be chosen. 

E/-MATRIX 

A preliminary concept required for the description of the [/-binary preferred embodiment of 
the invention (henceforth called also the "[/-method") is the concept of equivalence of 
vectors. Two non-zero vectors of the same dimension would be called equivalent if one is a 
product of the other. In the context of this embodiment, note that two [/-vectors are 
equivalent if they are equal or if one of them is (-1) product of the other. The following 
preferred embodiment is rather similar in most details to the above one. The main difference 
is that in the following text elimination is done to equivalent (rows or columns) vectors rather 
than to equal vector. This speeds the elimination rate and thus enhances the efficiency. This 
embodiment is introduced through an example, demonstrating the essence of its ideas: 
Example: Consider the product of the following 5x13 [/-matrix A by a 13 -dimensional input 
vector x. 

The matrix is given by: 

-11-1-11 1-11 l-ll-ll' 

1-1-11 1-11-1-11 -1 -1 1 

A= 1 1-11-1-1-11-1-11 1 1 

-11 1-1-11-11 1-11 1-1 

-11-1-11 1-11 1-11 1 1 

and the input vector is given by: 
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X = 



The result of the product is written by: 

yi 

y = y l =A'x 

yy 

The first step in the process is to check for equivalent lines. In practice it is a one 
time operation done before that arrival of input vectors. A scan discovers one occurrence of 
equivalent lines in the given matrix, line 2 is equivalent to line 4, in fact line 4 equals to (-1) 
product line 2. It follows that y 4 = -y 2 Therefore it not necessary to compute both y 4 m<S.y 2 , 
hence the computation of y 4 will be omitted. Thus line 4 is deleted from the matrix A. The 
resulting matrix A' is given by: 

-11-1-11 1-11 1-11-1 1" 

1-1-11 1-11-1-11 -1 -1 1 

A'= 1 1-11-1-1-11-1-11 11 

-11-1-11 1-11 1-11 11 



The corresponding output vector is given by: 

yi 
y 2 
y^ 



y' = 



and it holds that: y'=A'*x. 

This is the first reduction. In the next step the product A is decomposed to a sum of 
the A -columns multiplied by the respective ^-components. Therefore: 
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Next, a normalization of each of the column vectors in the above sum is carried out, 
such that the upper component of each modified vector is 1. Thus the vector and its 
coefficient are both multiplied by (-1) if the upper component of the vector is (-1). However 
no modification is done if the upper component of the vector is 1. The following normalized 
form of the above sum is derived: 
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In this normalized sum the upper component of each vector is 1 and the number of 
different vectors is reduced. The next step is to collect and sum coefficients of the same 
column. Thus with the cost of 8 additions the following equation is obtained: 
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y 2 



= ((-x } )+ (-x 4 ) +x 6 +x 9 ) 
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New coefficient would now be defined: 

=(-Xj) + (-X4) + X 6 + X 9 , W 2 = X 2 + (~X 7 ) + X 8 + (-X10) + X Uy 

Hence the above equation may be written in the following form: 
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The coefficients: wu W2, W3, w 4 , w 5 are known to the processor at this stage. In the 
next stage this vector equation will be split horizontally to the following set of two equation: 
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Now each of these equation will be processed separately in the same manner of the 
first iteration. The upper equation requires no normalization since it inherited the normal 
state of the previous equation, and thus normalization is required only for the second 
equation. Hence the resulting set is: 



yi 




1 


+ w 2 


1 




1 


+ W4 


1 




1 




-1 


-1 


+ w 3 


1 


1 




1 


y 2 











y 3 


= (-wi) 


1 




1 




1 


-1 


+ w 2 


1 


+ w 3 


1 


y s 







+ (-W4) 



+ (-w 5 ) 



-18- 



Attorney Docket No. 10559-346001 /P8300 



In the next step the coefficients of the same vector are collected and summed. Thus 
with additional 6 additions the outcome is: 



(Wj + W2) 



(-Wj ~w 4 ) 



1 



1 

-1 



+ (w 3 + w 4 +w 5 ) 



+ (w 2 + W3 + -w 5 ) 



Clearly, the vector y is found with 4 more additions. To obtain the original output 
vector y it is only necessary to recall now that y 4 = -y 2 . Thus the total of 18 additions was 
required for this process. Observe that conventional prior art methods would require the 
total of 60 additions to compute the output vector. 



The £/-binary preferred embodiment of the invention is an efficient method for the 
product of a rxm £/-matrix A by an n-dimensional input vector x. It usually provides more 
gain than the 0-1- preferred embodiment and has more applications. 

The matrix A js given by: 



a n a u 
a 21 a 2 2 



*2n 



wherea z y=±7 for all 1 < i < r , 1 <j 



< n. 



and the input vector is written by: 



The entries of the input vector may be real or complex or belong to any field of scalars with 
characteristic larger then 2. The goal is to compute the result of the product: 

y=A*x. 
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The first step is a modification of the given matrix by the elimination of equivalent 
lines, a step which may be part of preliminary preparations. If it is found that the /-line is 
equivalent to the y-line then it can be deducted that y( is equal to ±yj. Therefore a preferred 

embodiment of the first step of the invention determines that one of two equivalent lines 
would be omitted. For the sake of clarity the policy would be that the line having larger 
index would always be the omitted one. Thus, if it is supposed that j > i then the 7-line 
would be omitted. This initial operation is reversed at the final step of the process where yj 
which is known at this stage is put back to its place in the vector^. This elimination process 
continues until there are no two line in the modified matrix that are equivalent. 

This stage is performed whenever there is a practical probability of equivalent lines. 
This is often the case in multi-product (another aspect of the present invention) applications 
of the {/-method. This stage is always performed when log 2 (r) > n since this condition 

ensures the existence of equivalent lines. However there are other conditions that ensures 
the existence of equivalent lines and should be taken into consideration. This for example, 
may be the case in sub-matrix of the Hadamard matrix. To avoid clumsy notation, the 
resulting modified matrix will have the same name, A 9 as the original matrix, and the names 
of the dimensions rxn will also be preserved. 

The next step is the elimination of equivalent columns. This will reduce the horizontal 
dimension (i.e the number of columns) of the modified matrix with minimal cost in additions. 
All the management related to this step is made, as in the 0-1 -aspect, one time per each 
matrix usually as preliminary preparation prior to the arrival of data vectors. First, the 
product y^A+x is expressed as a sum of the ^-columns multiplied by the corresponding x- 
component. 



Letv 7 * (for every y) be they" column of A: 



Vf- 



*2j 



Then: 



A*X=Xi V1+X2 V2+ + x„ v n 
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Next, the upper element of each column vector of the matrix A is examined and 
normalized to +1 value, if it is not already +1. Hence, when a//=+l then no normalization is 
required whereas if ay— I then normalization is takes place and the corresponding 7-column 
vector is multiplied by -1, as well as its corresponding coefficient xj. A new representation of 
the above sum in is thus obtained, where the upper entry of each vector being equal to +1. 
In this representation equivalent columns are always equal 

Identical columns are then grouped together, as in the above 0-1 -embodiment, and 
rearranged by taking each column as a common factor, multiplied by the sum of its 
corresponding scalar coefficients. Hence, the resulting summation comprises the 

subsequence of all distinct normalized column vectors wi, ,w m (having no repetitions) 

extracted from the sequence of normalized column vectors by screening out repeating 
normalized columns. Each distinct normalized column wj is multiplied by a coefficient tj 
which is a sum of the corresponding normalized coefficients of all normalized columns that 
are equal to this column. It is clear that m < «, and the difference, n-m, is the number of 
repetitions in the original sequence of normalized columns: a n v h a 12 v 2 , ,ai n v n . 

Mathematically the process is: 

A9X =^l<j<n x j v j 
= ^l<k<m^ j such that ajjVj = w k ( X j a lj) a lfj 

= ^ 1 < k < m such that a jj vj = a lj x j) w k 



For all 1 < k< m define: f*= □ j such that: Vj = Wk ajj xj 

The main computational task involved in this part of the invention is this calculation of the 

new coefficients tj 9 9 t m as sums of the normalized original ones. The cost in additions is 

n-m. The product A*x is thus given by: 

Avcss ^l<k<m'k w k 

The gain by the columns elimination process depends on the structure of the original matrix A. 
It is significant in some of the matrix that are commonly used in coding and decoding signal 
vectors in wireless communication, such as certain types of submatrix of the Hadamard or the 
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cyclic PN matrix. It is substantial also when r, the number of lines in the matrix A, is small 
relative to the number of rows n. In particular when r < log(n), in such case m-n >n- 
2 r ~ l >0. These observations are largely true also for 0-1 -matrix. 

The remaining part of this preferred embodiment of the invention, horizontal and 
vertical splitting, iterations, is equivalent to its 0-1- counterpart. 

To view the saving enabled by the use of the invention note first that additions are the 
main part of the complexity. For a rxn [/-matrix A such that r < log2 (n) the number of 

additions in the worst case, denoted by s(n , r), is given by: 

s(n , r) = n + s(r) 



where: 




s(l) = -1 


s(n , 1) = n - 1 


s(2) = 0 


s(n,2) = n 


s(3)= 3 


s(n,3) = n + 3 


s(4) = 8 


s(n,4) = n + 8 


s(5) = 19 


s(n,5) = n + 19 


s(6) = 38 


s(n,6)=n + 38 


s(7) = 75 


s(n, 7)=n + 75 


s(8) = 144. 


s(n,8) =n + 144 


s(9) = 283 


s(n,9) =n + 283 



The following bound is always valid: 



s(r) < 2 rA + 2 r/2 +1 - r 
The prior art conventional method requires (n-l)*r additions. Since s(n , r)/n 
approaches 1 as n goes to infinity and r is constant this method is asymptotically optimal. The 
more complex multi-product embodiment of the invention that will be described later requires 
a precise bound to the worst case number of additions. This will be enabled by the following 
regression formula. 

For even r: s(r) = 2 r ' 1 + 2s(r/2) 

For odd r: s(r) = f' 1 + s((r+l)/2) + s((r-l)/2) 

When A is an rxn [/-matrix and there are no restrictions on the dimensions rxn of the 
matrix A then the number of additions required by the invention to compute the product A*x 
in the worst case is always bounded by: (1 + U)n*r/log2(n), where 1 > □ > 0, and □ goes to 
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zero when both, r and n, go to infinity. In the case that r>n a tighter bound (with respect to 
this case) exist, which results from the application of the vertical split: 
(1 + U)n*r/log 2 (r), where 1 > □ > 0, 

and □ goes to zero when both, r and n, go to infinity. 

However, if there is a specific type of structure in the matrix A then the number of 
additions required by the preferred embodiment of the present invention to compute the 
product A 9 x may drop drastically. The most general conditions for it are beyond the scope of 
the current text but some examples would be mentioned. However in the cases where A is 
nxn Hadamard or cyclic pseudo-random (PN) matrix then only n*log2(n), additions, and a 

memory of n scalars, would be required, which is optimal in this respect. This remark holds 
also for the 0-1 -matrix. 

EXAMPLES OF INDUSTRIAL APPLICATIONS 

There are several technologies that will benefit from implementation of the preferred 
embodiment of the invention, described above. One such technology is image processing 
which encompass several procedures which include the product of a [/-matrix by a vector. 
In wireless communication CDMA, IS-95 and in the more advanced third generation wide 
band CDMA, there are several processes which employ the product of a [/-matrix by a 
vector. These processes will be done with less energy consumption, circuitry and sometimes 
also running time by the use of the invention. 

In the context of multi-code in the IS-95-B, 8 lines of from the Hadamard-64 
comprise a [/-matrix which is multiplied by a vector whose entries are samples of a signal. 
Another application is in neighbor detection which done by a despreader. Here too there is a 
product of a matrix composed of few Hadamard lines by a data vector. Another application 
is the searcher. It searches for high correlation of sequences by the computation of their 
mutual scalar product. The correlator sequences are extracted from pseudo random (PN) 
sequences, which are [/-sequences, and their scalar products by a data vector is the required 
computation. Initial inquisition is one instance of a searcher. 

GENERALIZED-ELIMINATION-METHOD 

The above binary aspects of the invention are characterized by iterations of the 

following processes: zero lines and equivalent lines (meaning that one is a scalar product of 
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the other) are omitted; zero columns are omitted along with the scalar components of the 
input vector associated with them; equivalent columns are grouped and summed together and 
the matrix is split horizontally or vertically after these two features are exhausted. According 
to a preferred embodiment of the invention this iterative sequence of steps may be applied to 
any product of matrix by vector, in any field of scalars. This broader preferred embodiment 
of the invention would be called the generalized-elimination-method (GEM). 

The summation of equivalent columns is fundamentally a repetition of the following 

process. Let V/, V2 9 , v n be the columns of the transformation matrix A and x—(xj 9 X2 9 

, x n ) be the input vector and the goal is to compute: A x. Suppose that it holds after an 

appropriate rearrangement of the indexes that each of the columns v k +u > v n (for some 

2 < k < n) is a scalar product of the k- column, that is for k <j < n: Vj=zjVk. Then the n- 
dimensional vector x is replaced by the reduced ^-dimensional vector: 

x'=(xi 9 x 2 , , Xk-i ,x'z) where x\ = *k + Xk+rzk+ } + +x n 9 z n 

and the rxn matrix is replaced by the reduced rxk matrix A* whose columns are vj 9 V2, 

, vt. It then holds that that: A'-x-A-x. According to a preferred embodiment of the 

invention this process, including, if necessary a change of indexes, is repeated until no two 
columns are equivalent. Indeed this is process is simply a generalization of the process of 
normalization taken place in the [/-matrix embodiment of the invention. In practice all the 
above equivalent-columns reductions may be done simultaneously. When this stage is done 
it would be considered a first iteration of the whole process. Then the matrix is split 
horizontally and in each split the above column elimination repeats in a recursive manner as 
was done in the above embodiments. 

As is demonstrated by the [/-embodiment and by next example an efficient method to 
eliminate equivalent columns is by first dividing each column by the upper most component 
and accordingly multiplying the corresponding coefficient by the same uppermost 
component. It results with having 1 in the uppermost component of the each modified 
column. However, sometimes in applications of the GEM the uppermost component of a 
column in the transformation matrix is zero making this division impossible. In such cases 
the simple solution is to divide each nonzero column by its uppermost non zero component 
and accordingly multiply the corresponding coefficient by the same uppermost non zero 
component. . The results is that there is 1 in the uppermost non zero component of the each 
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modified column. This have the desirable consequence that two columns are equivalent in 
the modified matrix if and only if they are equal. 

To allude to cases where this preferred embodiment of the invention enhances the 
efficiency, consider a matrix having its entries in a relatively small finite set S. In practice 
this set S maybe a finite field a finite subgroup of the multiplication group of a field or a 
finite subset of a field closed under multiplication, and in greater generality any finite subset 
of a field. It includes the binary cases discussed above where S={0,1), or S={1,-1} and other 
very common situations where S={0,1,-1} or S={l,-lJ*j) or S={0,l r lj,-jl The gain 
decreases with the size of S and the general rule holds that the GEM reduces the number of 
additions by a factor of log\ s \(n), where n is the number of columns in the given 
transformation matrix. When applicable, the GEM should also be compared for efficiency 
with the complex and general embodiments that will later be described. 

To illustrate the way this method works it would be demonstrated for a matrix whose 

entries are from the set: U 2 =fef^{l, -1, j, -j} called [//-matrix. Such type of matrix appear 

often in practice. However, it must be emphasized again that the GEM is not limited to any 
specific type of matrix. The number of additions is bounded by C = n+f' 1 + off 1 ) for the 
product of rxn Ui -matrix by a vector. 

Example: Consider the product of the following 3x15 [//-matrix A by a 15 -dimensional 
complex input vector x. The matrix is given by: 

[1 -j j -1 1 -7 "I j "I J "I 1 -J 1 J 1 
-l j -i j i j i i j -y -j -j -j -i ~ j I 
j i -l j j j i-ii -l -l -./ i j -iJ 



and the input vector is given by: 



x = 



v 15 



The result of the product is written by: 
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y = 



All the additions that are done in this example are complex additions. The first step is 
a check for equivalent lines. No equivalent lines are found. In the next step the product A*x 
is decomposed to a sum of the ^-columns multiplied by the respective x-components. 
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Next, a normalization of each of the column vectors in the above sum takes place. It 
is done by multiplying each vector by the inverse of its upper component and accordingly 
multiplying each coefficient by the (same) upper component of the corresponding vector. 
Hence the upper component of each modified vector is 1. No modification is needed if the 
upper component of the vector is already 1. The following normalized form of the above 
sum is derived: 
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In this normalized sum the upper component of each vector is 1 and the number of different 
vectors is thus reduced substantially when (in the general setting) the number of columns n is 
sufficiently large relative to the number of rows r (the requirement is: n>4 r ' J ). The next step 
is to collect and sum coefficients of the same column. Thus with the cost of 7 additions the 
following reduction is obtained: 
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New coefficient would now be defined: 

W } =X } + (-j)'X 2 +j-Xjo + X } 4 +j'Xi 59 W 2 ^j-X 3 

W 3 = ("1)X 4 + Xn, W 4 = X 5 +(-j)x 13 , W 5 = (-j)x^r (-l)x 7 1 

w 6 =j-x 8 , w 7 =(-l)x 9f w 8 = (-l)xu 

Hence the above equation may be written in the following form: 
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The coefficients: w u ,wa are known to the processor at this stage. Next, this vector 

equation will be split horizontally to the following equivalent set of two equation: 
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In general both are vectorial equations. However in the present miniature example 
the second equation is degenerate to a scalar equation. The rule now is that each of these 
equation will be processed separately in the same manner as the first iteration. The upper 
equation requires no normalization since it inherited the normal state of the previous 
equation, and thus in any non degenerate application normalization is required only for the 
second vectorial equation. Consequently the resulting reduction for the upper vectorial 
equation is: 



>1~ 




' 1 " 




"f 




' 1 " 




"1 


y 2 


= (W1+W5) 


-1 


+ (w 2 +w 8 ) 


j 


+ (W3 + W6+W7) 




+ W4 


1 





















The amount of 4 more additions where required for this step. The computation is 
completed now in a straight forward manner, consuming additional 13 additions for both 
equations. Thus total of 24 additions where used by the preferred embodiment of the 
invention in this example. Brute force computation would have required 42 additions. In a 
larger scale the saving is far more substantial. 



TOPLITZ MATRIX 

To demonstrate this preferred embodiment of the invention an example would be presented. 

Example: Suppose that a sequence of length 8 is given: u=(l,l ? -l ? -l, 1,-1,1,-1) and it 

is desired to check 3 consecutive hypothesis of the data vector: x—(xi,X2, ,xiq). 

The purpose might be a search for maximal correlation. The following three sums are 
needed to be calculated. 

yi = l'xj + l*x 2 + (-1>*5 + (-l> x 4 +1 # JC 5 + (-1) • jc tf +l*x 7 + (-iyx 8 

y 2 = l*x 2 + l*x 3 + {-iyx 4 +(-l) # *5 + (-1)**7 +l*x 8 + (rl) 9 x 9 
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y 3 = 1*X 3 + l*x 4 + (-1>X 5 + 1»JC 7 + (-1) • JC tf + 1* JCp + (-l>JC/o 

This can be represented by the following toplitz matrix product: 







"l 


i 


-1 -11-11-10 


0 


y 2 




0 


i 


1-1-11-11 -1 


0 






0 


0 


1 1-1-11-11 


-1 



v 10 





1 




1 




-1 




-1 




1 




-1 




1 




-1 




0 


+ X2 


1 


+ X3 


1 


+ X4 


-1 


+ Xs 


-1 


+ x 6 


1 


+ X7 


-1 


+ x s 


1 




0 




0 




1 




1 




-1 




-1 




1 




-1 



+ 





"0" 




"0" 


Xg 


-1 




0 




1 




-1 



The next step is to gather complementing vectors, thus: 



yi 




1 




0 


) (* 


1 




0 




-1 




-1 




1 


yi 


= (xj 


0 


+Xg 


-1 


1 


+ x 10 


0 


) +X 3 


1 


+ X4 


-1 


+ Xs 


-1 






0 




1 




0 




-1 




1 




1 




-1 



+ 





-1 




1 




-1 


X 6 


1 


+ X7 


-1 


+ x 8 


1 




-1 




1 




-1 



at this point consider the term inside each of the brackets and make use of the lemma that if x 
, y are scalars and v , u are vectors then: 

xv +yu=-(x+y)(v + u) + —(x-y)(v-u) hence: 
2 2 





T 




"0" 


Xj 


0 




-1 




0 




1 



= ^(X!+X 9 )( 



"1" 




"0" 


0 


+ 


-1 


0 




1 



) + ^(Xj -x 9 )( 



"1" 




"0" 


0 




-1 


0 




1 



^ (Xi +Xg) 



1 
-1 
1 



+ \( X 1 - X 9> 



1 
1 

-1 
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similarly: 



"f 
1 


+x 10 


"0" 
0 


1 . 

_ — ( *2 +Xio) 


' 1 " 
1 


1 

+ — (X2-X10) 


T 
1 


0_ 




-1_ 








_1_ 



X2 



hence with the cost of 4 additions the outcome is: 



yi 
y 2 

>>3 



^(X] +x 9 ) 



1 

-1 
1 



+ ^( X] - x 9 ) 



1 
1 

-1 



^(x 2 +Xio) 



1 
1 

-1 





1 




-1 




-1 




1 




-1 




1 




-1 


xio) 


1 


+ X 3 


1 


+ X4 


-1 


+ x 5 


-1 


+ x 6 


1 


+ X7 


-1 




1 




1 




1 




1 




-1 




-1 




1 




-1 



but now the above CZ-binary method can be applied. Thus the next step is a 
normalization that will make the upper component of each column to be equal to 1. 



y 2 
y*. 



-(Xi +x 9 ) 



1 

-1 
1 



+ ^ ( Xj - x 9 ) 



1 
1 

-1 



1 , 

+ -(x 2 +x 10 ) 





1 




1 




1 




1 




1 




1 




1 


Xio) 


1 


+ (-X3) 


-1 


+ (-x 4 ) 


1 


+ Xj 


-1 


+ (-x 6 ) 


-1 


+ X7 


-1 


+ (-x S ) 


-1 




1 




-1 




-1 




-1 




1 




1 




1 



The next step is to collect and sum coefficients of common columns. Thus with the cost of 6 
additional additions we arrive to the equality: 



= (Xi +X 9 ) + (-X 6 ) + X 7 + (-Xi)) 



y^ 
y 2 

i 



Thus if we define for convenience: 

Wi=^(X! +x 9 ) + (-x 6 ) + x 7 + (-x 8 ) , 



1 

-1 
1 



+ (Xi - X 9 ) + ^(x 2 +X] 0 ) + (-x 4 )) 





1 




' 1 " 


xio) 


1 


+ ((-x 3 )+ x 5 ) 


-1 




1 




-1 



1 1 

W 2 = - (xj - Xg) + ~(X 2 +Xjq) + (-X 4 ) 
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W3 = \ ( X2 ~ Xl °)> 

then the equality is written by: 



w 4 = (-x 3 ) + x 5 



>1~ 




1 




1 




1 




1 




= Wi 


-1 


+ W2 


1 


+ Wj 


1 


+ M>4 


-1 






1 




-1 




1 




-1 



In the next stage this vector equation will be split horizontally to the following set of 
two equation: 







1 


+ w 2 


1 




1 




1 




-1 


1 


+ w 3 


1 


+ W4 


-1 


y 2 







y 3 = tt'y + (-w 2 ) +w 3 +(-w 4 ) 

Collecting identical columns in the first equation we obtain with two more additions: 



= (\V]+ W4) 



1 

-1 



+ (W2+ W3) 



The desired output vector y can now be found with 5 additional additions. Thus the 
method embodying the invention required 17 additions to do this computation. A brute force 
method would require 21 additions. Clearly the gain obtained by this miniature example is 
not grand but as we shall see the gain in large scale applications is comparable to the gain of 
the above "square" (non-toplitz) [/-method. 

Next is a description of the general setup of this aspect of the invention. Suppose a 
{/-sequence is given: 

U],U2, ,U n 

and sequence of data of real or complex scalars: 

Xj,X2f >Xn+m-l 

The elements of the data sequence may be real or complex or belong to any field of 
scalars of characteristic larger than 2. Suppose that it is desired to calculate the following 
sums: 

yi =m*xi +u 2 *x 2 + + u n *x n 

y 2 =urx 2 +u 2 *x 3 + + u n *x n+} 
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y m =urx m +u 2 *x mH + + u n *x n + m „i 

If m< log 2 (n) then let r=m, otherwise, if m > log 2 (n), then a preferred embodiment of the 
invention would start by dividing the set of m sums to blocks of r consecutive sums where r 
< log 2 (n). All the blocks are treated in an identical way so the method may be demonstrated 
on the first block. The first r sums can be represented by a product of toplitz matrix by a 
vector. Consider then the rx(n +r-l) toplitz matrix : 



U, 



0 0 



0 u x u 2 



0 0 
0 0 



0 u, 



0 w. 



0 0 



u 



0 0 
. 0 



and the (^+r-7 > )-dimensional vector: 



Then the first r sums are given by the vector: 
yi 

y= 



where: 



y=A*x. 
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The underlining idea of this preferred embodiment of the invention is to reorganize the 

given problem such that the above [/-method will be applicable. Let vj,V2 ,v n + r -i be the 

columns of the matrix A according to their order. Observe that all the following "middle" 
column vectors are [/-vectors: 





u r 




U r + 1 




U 

n 








u r 




U , 


v r = 




, V r +1 = 




, ,Vn = 










_u 2 _ 







Note also the following sums and subtractions of "matching" pairs of "side" column vectors 
are also [/-vectors: 









0 








u l 




0 




0 








U n 




0 
















U n-l 














+ 






V; - V„+l = 










_0_ 




_ U n-r+2 . 




_ U n-r+2 _ 




_0_ 




_ U n-r+2 _ 



— u 

n 

n-\ 



U n-r+2 _ 





M 1 




0 












0 




u x 




u 2 




0 




u 2 




u 2 




0 




u 2 


V2 + V n+2 = 




+ 


"„ 




u « 


, V 2 - V n+2 = 






U n 








0 




_ U n-r+3 _ 




_ U n-r+3 . 




0 




_ U n-r+3 _ 




_~ U n-r+3 
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u r _ x 




0 




"r-1 




"r-1 




0 




"r-1 


U r-2 


+ 


0 




U r-2 


, V r -l - V n+r -l - 


"r-2 




0 




"r-2 






0 




u v 




u x 




0 






0 




«« 




u 

n 




0 








— M 

_ « 



After working the necessary preliminary preparations the method may be introduced. 
Rearranging with accordance to the above: 

Amx=Z ^l<j<n+r-l x fj 
= ^l<JSr-l x fj + x n+fn+j + ^r<J<n x fj 
Next the invention uses the rule that if a: , y are scalars and v , u are vectors then: 

xv + yu = —(x+y)(v + u) +— (x-y)(v-u). Thus: 

A*x= Ui<j< r -1 \( X J + x n+j)( v j + v n+j) + ^(VWfXrW + D r <y< n 

This process cost 2r-2 additions and the form that was accomplished is in fact a product of 
an rx(n+r-l) {/-matrix by a w+r-7-vector. This is so since by the above all the vectors: 

V r ,V r +U ,V n , V; + V n +1 , ,V r -l + V n+r -l , V/ - V n+I , 9 V r -l ~ V n + r -l 

are {/-vectors. Thus the rest of the computation may be done by the {/-method. All the 
aspects of this modification to {/-form, except for the actual summations/subtractions: xj ± 
x n +j, are done as a "one time job" prior to the arrival of the input vectors. 

The number of additions in the worst case is given by the expression: 
s t (n , r) = s(n+r-l,r) + 2r-2 = n+3r-3 +s(r) 
In the general setting, where sums m are computed and m> log2(n) the number of additions 
in the worst case is bounded by (1 + U)(n+3log2(n))*m/log2(n), where 1 > □ > 0, and □ 

goes to zero when both, m and «, go to infinity. 

According to another preferred embodiment of the invention into the above method 
components of the GEM may be integrated. In such case some of the "side" columns 
remain unprocessed by the first stage of coupling side columns. 
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(0,l r l)-MATRIX. 

Toplitz matrix is part of a more general class of matrix: those matrix whose entries 
are 0, 1 or -1. Such matrix would be called in here (0,l,-l)-matrix. A broader version of the 
ideas related to the above toplits - method would now be developed. Suppose that A is (0,1,- 
l)-matrix of dimensions rxn and x is an ^-dimensional input vector and it is desired to 
compute the product: A*x. This product may be expressed as a sum of the ^-columns 
multiplied by the corresponding x-component. Thus, denote by vj (for every j) be the j 
column of A, then: 

A 9 X-X\ Vj+ X2 V 2 + + X n V n 

Some or maybe all the columns of A contain 0-entries. For the sake of notational simplicity 
it might be assumed that indexes are so arranged (methodically of course, without tasking the 

processor) that each of the first ^-columns v/, v 2 , contains a zero entry and all 

the remaining n-k columns (if there are any), Vk+u Vk+2, ? v w are [/-vectors (that is, 

without 0 entries). It is clear that each of the "mixed" vectors: V/, v 2 , ,v& is an 

average of 2 [/-vectors. Hence for each l<j<k there exists two [/-vectors uj 9 wj 9 such that: 

Vp ( uj + wj). A preferred embodiment of the invention finds the vectors uj 9 wj 9 , 

UfoWk, as a preliminary once-per-matrix preparation. Therefore by the above: 

A •x = ~ xi(u 2 + wi) + ~ x 2 (u 2 + w 2 ) + + ^ x k (u k + w k ) + X k +iVk+l+ + x n v n 

However with the opening of the brackets we find a representation of a product of an 
rx(n+k) [/-matrix by a fn+^)-vector. According to a preferred embodiment of the invention 
this task would be done by the above [/-aspect of the invention. The above toplitz aspect of 
the invention is a special case of this broader aspect. 

EXAMPLES OF INDUSTRIAL APPLICATIONS 
The searcher is usually represented by the product of a Toplitz matrix by a data 
vector. This is so both in CDMA and wideband CDMA. 

REAL MATRIX 

According to another preferred embodiment of the invention, the number of computational 

operations may be reduced for any real linear transformation by the use of distributed 

arithmetic's. In the proceeding text it will be called the "real-matrix-method". The linear 

transformation is represented by a matrix whose entries are real numbers, not necessarily 

-35- 



Attorney Docket No. 10559-346001 / P8300 



integers, having a fixed number of binary digits. The matrix will be decomposed to a sum of 

binary matrix with binary coefficients. In the next stage the binary embodiments of the 

invention will be applied. To introduce this method consider the following example. 

Example: Consider the following 3x8 [/-matrix, A, having (for the simplicity sake of this 
example) integer entries and written in the decimal basis: 

"2 1 5 4 3 0 4 5" 



A= 



5 0 4 
2 7 3 



4 7 12 
4 5 0 3 



The input, 8- dimensional vector, is given by: 



It is desired to compute the 3-dimensional output vector: 



y= 



y 2 
y 3 



=A*x. 



Since processors normally work on binary basis the matrix A will be represented in 
this basis and thus it is given by: 



A= 



010 001 101 100 011 000 100 101 
101 000 100 001 100 111 001 010 
010 111 011 001 100 101 000 011 



This representation suggests the possibility of using distributed arithmetic's to 
express the matrix as a sum of three binary matrix: 

A = 6'AfOJ + U l 'A[l] + U U 'A[2] 



Where: 



A[0J = 



0 110 10 0 1 
10 0 10 110 
0 1110 10 1 
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A[l] = 



A[2] = 



1 0 0 0 1 0 0 0 

0 0 0 0 0 1 0 1 

1 1 1 0 0 0 0 1 

0 0 1 1 0 0 1 1 

10 10 110 0 

0 10 0 110 0 



Thus the first step taken by this embodiment of the invention is the construction of 
the binary matrix A[0], A[l], A[2] comprising the bits of the entries of A according to their 
significance. The preferred embodiment under consideration will be a consequence of the 
equality: 

A*x = 6'AfOJ'x + U l *A[l] 'x + D D 'A[2] *x 
The next step is to compose the 24x3 binary matrix A* formed by horizontal chaining 
of A[0], A[1],A[2] 

A* = 



0 110 10 0 1 
10 0 10 110 
0 1110 10 1 



0 0 1 1 0 0 1 1 
10 10 110 0 
0 10 0 110 0 



1 0 0 0 1 0 0 0 

0 0 0 0 0 1 0 1 

1 1 1 0 0 0 0 1 



and also to form the 24-dimensional column vector x* comprised from 3 replicas of 
the vector x each with the corresponding binary weight deduced from the above sum. This is 
done before the arrival of input vectors begins. 
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x*= 



2° Xl 
2\ 



2*x s 
2 2 x, 



2 2 x, 

Note that multiplying by 2 l requires only index shifting operation. The key to this 
embodiment is the assertion that: 

y=A*x=A**x* 

Thus in what follows a computation of A* m x* through the 0-1 -method would take 
place. Hence the next job is to collect and sum coefficients of common nonzero columns, 
creating new coefficients and thereby reducing the size of the matrix. 

Hence it holds that: 





1 




0 




1 




0 




1 




0 


y= wj 


0 


+ W2 


1 


+ M>3 


1 


+ W4 


0 


+ w s 


0 


+ w 6 


1 




0 




0 




0 




1 




1 




1 



where: 

wj=2 2 -Xt + 2 2 -X& + 2 1 -Xa + 2 1 •Xs + 2°-Xs 
w 2 =2 2 -Xi + 2 1 -X6 + 2 0 -X\ + 2 0 -Xi 
w 3 = 2 2 -X3 

w 4 =2 2 -X2 + 2 1 -X2 + 2 1 -X3 

ws= 2 1 • Xs + 2° • X2 + 2° • X\ + 2° • JC3 + 2° • *8 

w 6 =2 2 -Xi + 2 1 -X6 + 2 0 -X\ + 2 0 -Xi 

This was done with a cost of 16 additions. Next, a row splitting of the resulting 
matrix takes place. Hence: 
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1 


+ W 2 


0 




1 




1 




0 


= W] 


0 


1 


+ W3 


1 


+ w 5 


0 


+ w 6 


1 


y 2 









and: 

y 3 = wj + Ws + W$ 

Now, summing the coefficients of common columns in the first equation yields: 

r 

0 



(w 2 + w 5 ) 



+ (w 2 + w 6 ) 



"0" 




T 


1 


+ W3 


1 



This step cost 2 additions. The rest is done with 2 more additions. Total 20 addition 
operations where required to compute the transformation in this example. The use of 
conventional methods would have required the equivalent of 29 addition operations, where by 
"equivalent" we mean that additions comprising product operations are also taken into 
account. 

Obviously this is not an example of outstanding saving. It is rather an easy 
introduction to the idea of this embodiment of the invention. However substantial saving is 
accomplished by the invention when the parameters of the matrix (dimensions and number of 
digits for each entry) are large. 

In general, the preferred embodiment of the invention under consideration relates to an 
efficient computation of a real, unrestricted, linear transformation. The rxm matrix A 
representing the transformation is written by: 

. . . . a r 



A = 



where the entries of the matrix are real numbers. It is desired to compute A*x where x is an 
/z-dimensional vector of real or complex scalar entries written by: 
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X= 



It is assumed that the entries of the matrix A are written in binary basis with fixed numbers 
of digits before and after the point. Two preferred embodiments of the invention will be 
presented as optional solutions and the choice between then depends on the structure of the 
transformation matrix. The first is called the 0-1 -decomposition, the second is called the U- 
decomposition. 

It is presumed that the entries of the matrix A are bounded by 2 m i and have accuracy of m 2 
digits beyond the point for appropriate positive integers m j and m 2 . This assumption does not 
restrict the scope of the invention, since every scalar encountered by a processor has a finite 
number of digits before and after the point. Put m=mj+m 2 +i. Distributed arithmetic's 
would be applied to break the product A*x into a sum of m binary products of the type treated 
by the example above. 

First the invention is described in the case that the entries of the matrix are nonnegative and 

the decomposition is 0-1 -based. A real number t bounded by 2^ 7 and having m 2 digits beyond 
the point may be represented in the following way: 

where each tk isOorl. This is the standard binary representation. By a typical distributed 
arithmetic's argument A can be decomposed to a sum of m rxn 0-1 -binary-matrix in the 
following way: 



A= □ 



■m 2 < 



k < m] 2*-A[k] 



In this sum A[mJ is the 0-1 -matrix of the most significant bits (MSB), A[-m 2 ] is the 0-1- 
matrix of the least significant bits (LSB). In general, each of the matrix A[k] is a 0-1-matrix 
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whose entries are comprised of the k's-significant-bits of the entries of A 9 where each bit is 
placed in its corresponding entry. By the distributive rule it follows that: 

A * x= E-m 2 <k<mj 2k * A [ k ]*x- 
This is the basis for the current embodiment of the invention. 

Next let A* be the rx(m*n) 0-1-matrix composed by the horizontal lining of the binary matrix 
of the above sum in the order they appear in the sum. Then, 

A*=[A[-m 2 ], A[0], ,A[m } ]] 

This would be done, one time for any given matrix, before the arrival of input vectors begins. 
In addition let the m # «-dimensional column vector x* be given by: 



x*= 



2 -m 2 +1^ 



2°X 



2 mi x 

The key observation underlining the embodiment under consideration is that: 

A*x-A**x* 

The latter is a product of a rx(rn*n) 0-1-matrix by a (m+l) # n-dimensional vector which is 
composed of shifted m replicas of the original vector. 

The computation of the product v4**x* is done with the 0-1 -binary-embodiment of invention. 
Since each multiplication by an integer power of 2 may be implemented by shifting the bits, 
very little additional complexity is added. 

This embodiment of the invention may reduce significantly the complexity of the product. 
Let: 

/ = log 2 (m*n) = log 2 (m) + log 2 (n) 
and let C(A) be the number of additions required by the invention to compute A *•#* If / > r 
then: 

C(A) < mm + 2 r + 2 r/2 + 1 - r. 
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Particularly we see that in this case the cruder bound: 
C(A) < 2mm. 

When we drop the assumption that / > r , it holds that: 

C(A)< (l+U)*Yn*r*n/l 
where 1 > □ > 0 and □ goes to zero as mm and r goes to infinity, 

The conventional (brute force) prior art method of computing the product Ave requires, on 
average, the equivalent of nm*m/2 additions, where additions comprising the product 
operations are taken into account. 

In some cases, particularly when r=/ 9 or when r is rather small another variation of the above 
mentioned embodiment might be preferred, to enable further saving. Note that in the case r=l 
the problem is reduced to that of a scalar product between two vectors. This itself is a rather 
common computation in science and technology and its efficient performance is of much use. 
According to this variation a matrix A ** is formed by the chaining the matrix, 

A[- m 2l> A[' m 2l> ,A[0], ,A[m J in a vertical sequence. Thus it is the 

v(m+l)xn 0-1-matrix given by: 
A[-m 2 ] 



A[0] 



A[mJ 

Next, the computation of A***x is done by the preferred embodiment of the invention for 0- 
1 -matrix. The product A***x contains all the products: A[- 

m^+x, ,A[0]*x, ,A[m]] 9 x. Hence, using shifts for the binary multiplication's 

the desired result is achieved by performing the sum: 

y= ^-m 2 <k< mi 2 k *A[k]*x 
This concludes the part of matrix with nonnegative entries. 
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When the matrix A is real, having entries of both signs then the matrix A can be expressed as a 
subtraction of two matrix with nonnegative entries: A = Aj - A 2 . Following this representation, 
the aspect of the invention under discussion carries out the calculation of y=A*x by computing 
first each of the products yi=Aj*x and y 2 = A 2 *x , separately or in a combined form by the 
above method. Then the final step is the carrying out of the subtraction: y = y } - y 2 . 
The above 0-1 -binary option of a preferred embodiment of the invention with respect to a real 
matrix is particularly efficient when the decomposing binary matrix: A[-m 2 ], A[- 

m 2 +l] r A[0], y A[m j] are rather sparse. This may a consequence of the 

^-entries being of various sizes and having non-uniform number of binary digits beyond the 
point. In such cases the zero padding necessary for the above mentioned uniform digital 
format causes a higher level of sparsity. 

Another form of the preferred embodiment of the invention, based on [/-binary distributed 
arithmetic's, will be described by the next text. This form of the invention has the advantage 
of being more adaptive to matrix having entries of both signs and of being based on the faster 
{/-method. In practice it is more efficient then the above 0-1 -version when there is some 
uniformity in size and accuracy of the entries of the matrix. The main features of the following 
method are similar to the 0-1 -method. 

The following observation is required for the proceeding description of the invention. A real 
number t bounded by 2 m i and having m 2 digits beyond the point can be represented as a U- 
binary sum in the following way: 

t=^-m 2 -l<k<m r l Sk'2 k + s m: (2 m l- T™!- 1 ). 

where all s*, for -m 2 -1 <k <mj -1 are ±1 and s m] =l when t is non negative and s m ~l 
when t is negative. 

Therefore an rxn real matrix A with entries of both sighs can be decomposed to a sum of U- 
matrixin the following way: 

A= ^. m2 .l<k<mj-l 2**A[k] + (2 m l- r^-Afmj]. 

Where each of the matrix AfkJ is a rxn [/-matrix. 

Let A * be the rx((m+l)w) [/-matrix: 
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A* = [A[-m 2 - 11 Af-mrf, ,A[0], Afmj - 1], AfmjjJ 

This matrix is composed as a one time task per each matrix before the arrival of incoming data 
vectors. 

In addition the (m+i^w-dimensional column vector x* is defined by: 

x*= 2° x 

2 mi ~ l x 
_(2 rai -2- m2_1 ).x_ 

It holds that: 

^•x=^4**x* 

This is a product of a rx((m+l)*n) [/-matrix by a fm+7^vz-dimensional vector. The 
computation of this product is done by application of the (/-matrix embodiment of the 
invention. 

As in the above method pertaining to 0-1- binary decomposition, also here there is vertical 
version for r=l or small r. It is completely analog to the one described above and there is no 
reason to repeat the details. 

Let / = log((m+l)m) = log(m+l) + log(n). It holds for / > r, that the number of additions 
required to perform the above method is bounded by 

C(A) < (m+l)m + 2 r ' 1 + 2 r/2 + 1 - r. 
As in the above text 3 C(A) is defined to be the number of additions required by the above U- 
embodiment to compute A*x. In greater generality, it holds that: 

C(A) < (l + U)*(m+l)v*n/l 
where 1 > □ > 0 and □ goes to zero as (m+l)*n and r goes to infinity. 

EXAMPLES OF INDUSTRIAL APPLICATIONS 
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Linear transformations are commonly used in every fields of technology and science. 
Applications of the real matrix aspect of the invention in communication technology include 
products of multi-user detector (MUD) matrix (such as decorrelators or minimum mean square 
error (MMSE) matrix) by a output vector of a despreader. It is also applicable in the 
calculations of least squares. Finite Impulse Response (FIR) filters where Discrete Fourier 
Transform (DFT) are fully or partially computed. Especially FIR filters of partial DFT and 
those of small and medium size where FFT is not efficient are instances where this feature of 
the invention is applicable. Discrete cosine transform DCT is another type of linear 
transformation whose computation may be improved by the invention. This is so especially 
when it is only partially calculated or when its size is not too large so that higher dimension 
fast algorithms are not very efficient. 

In some digital signal processing applications, such as processing circuitry which employs 
FIR's, correlation of two relatively long vectors is required. One of the vectors may represent 
taps of a FIR filter which operates on a second vector, representing an input that should be 
filtered. Filtering operation consisting of partial convolution is represented by the product of a 
Toplits matrix by a vector. This is done efficiently by the real matrix aspect of the invention. 

COMPLEX MATRIX 
Example: Suppose that it is desired to compute the following sums: 
yj = (l+jyxj + (l-j)'x 2 + (-l-j)*x 3 +(-l+y>*< +(1-J>* 5 + (-!+/)•** 
y 2 = (-l+7>* 7 + (l+7>* + (l+7>*3 +(~l-7>^ +(-l-»-*5 + 
ys - (I'jYxj + (-l-j)'X2 + (-l-7>x 5 +(1+/)-** +(-l+y>x 5 + 
where the input scalars xj , x 2 , ,x<§ are complex numbers. 

The conventional prior art approach would require 66 real additions. This is seen when we 
take into account that a product of a factor of the sort (±1 + j) by a complex number requires 
2 real additions and one addition of two complex numbers requires two real additions. 
Two main options of preferred embodiments of the invention for doing this computation will 
be presented. The first preferred embodiments of the invention that would be called phase 
rotation plus GEM, it uses the fact that: 

!(!+»• (l+» =J 
-y) = 1 

| (1+/) •(-!+/) = -1 
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1 0+7) •(-!-/) = 7 

Hence by multiplying all the above sums by ^Ot/X act which is called phase rotation here, 

we get coefficients from the set 
{1, -1,7, -7}: 

= jVC 2 + 1*X 2 + (-y>* 5 + (-l>X, + 1«X 5 + (-1)-** 
~(^ + j)y2 + j*X 3 + (-7>X 4 + (-/)• *5 + 

^(1+/)^ = 1^7 +(-7>^2 + (7>*3 + 7*^ + (-1>X 5 + 7 # *<? 

According to a preferred embodiment of the invention these sums are computed in by the 
GEM. Due to the small sizes of this example the gain is marginal in this case compared to the 
conventional way, but it is substantial in larger dimensions. When the dimensions are small, 
as in this example, other more conventional methods of computation may also be applied after 
the phase rotation step. Finally the result of each sum is multiplied by (1-y) to obtain the 
desired result. Note that multiplication by j or (-1) are "organizational" operations requiring 
modest amount of time and energy. 

The second option of preferred embodiment is called the complex- {/-method. It breaks the 

sum by opening the brackets of each coefficient in the following way: 

yi = X] + (jx } ) + x 2 - (jx 2 ) - x 3 - (jx 3 ) - x 4 + (jx 4 ) + x 5 - (jx 5 ) - x 6 + (jx 6 ) 

y 2 ^-xj + (jx{) + x 2 + (jx 2 ) + x 3 + (jx 3 ) - x 4 - (jx 4 ) - x 5 - (jx 5 ) + x 6 . (jx 6 )y 2 

yi= Xj - (jXj) - x 2 - (jx 2 ) - x 3 - (jx s ) + x 4 + (Jx 4 ) - x 5 + (jx 5 ) + x 6 + (jx 6 ) 

The rest is done by applying the [/-method. This requires at most 30 red additions by the 

above table of s(n,r) 9 since s(l2 , 3) = 12 + 3 = 15, where this is the number of complex 

additions. 

After demonstrating the basic principles by the above example it would now be appropriate to 
give detailed description of the general case. To describe the preferred embodiments of the 
invention with respect to complex matrix consider the sets that will be used as coefficients: 
Uj={l 9 -1, j, -j} and U 2 ={l+ j, 1 - j 9 -1+ j, -1 -y}. A [//-vector or E/^ -matrix have their 
entries in Uj. Likewise [/^-vector or C/^-matrix have their entries in U 2 . Such matrix and 
vectors are common in wireless applications. In what follows it should be taken into account 
that the product of a U 2 number by a complex number requires 2 real additions while the 
product of a Uj number by a complex number involves a relatively small amount of 
complexity. 
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The first computational problem to be solved is the following: suppose that an rxn ^-matrix A 
is given and an rc-dimensional complex-column input vector x and it is desired to compute the 
product y=A*x. The case r=l, which is indeed a scalar product, is included. Two main 
approaches to this computation will be presented. Each would be preferred when suitable. The 
same processes, with slight modifications, are applicable when the data vector is real 
First the phase rotation plus GEM preferred embodiment of the invention is introduced. Let 

andz=|(l+/> 

then B is an rxn [//-matrix and z=B*x. Next, the product z=g*x is computed of by the GEM. 
Once z is computed, the output vector >> is found by the product: y=(l-j) *z. The gain over 
conventional methods resulting from the initial phase rotation step is the saving of 2r*(n-l) 
additions. This gain exists even in the case that r=l, that is, the case of scalar product,. 
Further gain results from the application of the GEM. 

The second preferred embodiment of the invention is called the {/-complex-method. The first 
thing is to represent A as a sum: A = Aj + jA 2 where A j and A 2 are [/-matrix. Then consider 
the equality: A*x =Aj*x+ jA 2 *x . This equality implies that A*x can be computed through the 
product of the rx2n [/-matrix A* = [Aj, A2J with the 2«-complex-dimensional column vector: 

. This is expressed in the following equality: A 9 x= A*w*. Now the product A 

J x \ 

would be computed by the [/-method. 

There is another alternative within this preferred embodiment of the invention, which is 

A 



X* 



reasonable when r is small: Let ^4**= 



lA 



, this is an 2rxn [/-matrix. Then apply the [/- 



method to compute the product: A**»x. This in fact calculates both products y\ = A } *x and 
y2 = A 2 *x. The process is completed with the summation: y=yi+ y2. 
A variation on the above problem might arise in some applications that include toplitz matrix 
representations of PN correlators in CDMA. In this setting the matrix may also have zero 
entries. Accordingly, let: [/'y={o,l, -1, 7, -7} and U'f= {0,1+7, 1 -7,-1+7, -I-/}. AU'r 
vector or [//-matrix have their entries in [/;. Likewise U 2- vector or [/'^-matrix have their 
entries in U 9 2> An rxn [/'2-matrix A is given and an ^-dimensional complex-column input 
vector x. The goal is to compute the product y=A*x. The case r=l, which is indeed a scalar 
product, is included. 

The phase rotation plus GEM preferred embodiment of the invention will be discussed first. 
Let 

B=±(l+j)*A andz=i(l+7> 
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then B is an rxn t/ /-matrix and z=B*x. Next the product z=^B*x is computed by application of 
the GEM, or perhaps, when the dimension is low, by more conventional methods. Finally, 
once z is computed, the output vector^ is found by the product: y=(l-j) *z. 
According to another preferred embodiment of the invention, A is first represented as a sum; A 
= A 2 +jA 2 where A } mdA 2 are (0,l,-l)-matrix, possibly Toplitz. Then by the equality: A*x = 
Aj*x + jA 2 *x , the product A*x may be computed through the product of the rx2n (0,1,-1)- 



matrix A* = [A], A 2] with the 2«-complex-dimensional column vector: x*= 



x 



Lastly, the 



product A**x* would be computed by the Toplitz or by the more general (0,1-1) aspect of the 
invention. 

Another (optional) preferred embodiment of the invention, reflecting its non-toplitz 

"4 



counterpart, is efficient when r is small: Let A**= 



this is an 2rxn (0,1,-1 )-matrix. Then 



apply the (0,l-l)-aspect of the invention to compute the product: A***x. This in fact calculates 
the products yj = Aj*x and 72 = A 2 *x. Finally, the process is completed with the sum: y = 

yi +jyi. 

The complex chapter would now be concluded with a method for computing the product of a 
general complex rxn matrix A eC™ 11 by a real or complex w-dimensional vector x. According to 
one preferred embodiment of the invention, first represent A as a sum; A = Aj +jA 2 where Aj 

and A 2 are real matrix. Next by the equality: A*x = Aj*x + jA 2 *x it follows that A*x can be 
computed through the product of the rx2n real matrix A* - [Aj, AJ with the 2«-dimensional 

x 

column vector: x*= 



J* 



, since A m x= v4* # x* Finally the product would be computed 
by the real matrix method. 

[A ~ 

According to another (optional) preferred embodiment of the invention put 1 . This 

is an 2rxn real matrix. Then apply the real matrix method to compute the product: A** m x. 
This calculates the products yi = Aj*x and yi = A 2 m x. Finally, the process is completed 

with the sum: y=yi+ jji. 

Finally a product of a toplitz matrix with L^-coefficients by a vector is done by application of 

the toplitz technique above. 

EXAMPLES OF INDUSTRIAL APPLICATIONS OF TOPLITS-MATRIX, (0,1,-1)- 

MATRIX AND COMPLEX-MATRIX TECHNIQUES 
IS-95 Searcher: The IS-95 CDMA system allows a number of different base stations 

in a small geographic location to simultaneously use the same segment of spectrum for 

transmitting data to mobile receivers. The manner in which the data from different bases can 
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be differentiated is by way of the PN sequence which is used to spread the transmitted data. 
Each base transmits at a different phase of the PN sequence. The task of the searcher 
mechanism in the mobile receiver is to identify the different pilot signals transmitted by the 
surrounding bases by aligning with their PN phase. It is also applied to differentiate between 
several multipath signals (meaning echoes) arriving from the same base. A similar process is 
applied in the initial synchronization procedure. 

A searcher is required to test a number of hypotheses, by partial correlation of the 
received signal, for each hypothesis, with a locally generated PN sequence. The sequence is 
then shifted for each hypothesis, and the correlation is performed for a fix number of signal 
elements (chips). Normally the searcher is required to search all the hypotheses in a given 
window where at each time the sequence is shifted by 1. A searcher may be implemented in 
a DS-CDMA system by constructing a matrix A, whose rows are composed of the above 
mentioned shifted PN subsequences. The search results are then given in the vector y=A*x 
where x is a vector representing the incoming signal sampled at a single chip periods. 
According to a preferred embodiment of the invention the above mentioned inventive 
algorithms for efficient linear transformations are implemented, so as to obtain the vector y 
with reduced consumption of resources. Numerous applications of the invention may also be 
useful in the searcher of the suggested standards of the wide band CDMA. 

MULTI-PRODUCT 

Another preferred embodiment of the invention relates to a situation where partial sums of 
the {/-matrix by vector product are desired. This may occur in CDMA communication 
applications when several codes with different rates (spreading factors) are tested 
simultaneously. Studying this embodiment is more fruitful for readers who have already 
accomplished substantial command of the previous aspects of the invention. It will be 
introduced by the following example which gives first idea of this rather involved method 
without the abundance of details. However no example of reasonable size can describe all 
the aspects of this embodiment. The reader may also referred to the summery of the 
invention, item 6. 

Example: Consider the following 5x8 [/-matrix: 
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-1 


-1 



and an 8-dimensional input vector: 

x x 
x 3 



X 6 

x 7 

X, 

Suppose that it is said that (within the multi-product terminology) the spreading factor of 
lines 1 and 2 is 2 and the spreading factor of lines 3 and 4 is 4 and the spreading factor of line 
5 is 8. This means that in the first two lines every two successive elements are summed, in 
the third and the forth lines every four successive elements are summed and in the fifth the 
entire line is summed. The convention would be that the spreading factor is non decreasing, 
that is the spreading factor of each lines is equal or more to the spreading factor of the 
previous line. This terminology will later be precisely defined. 
It is implied by the above that the following sums should be computed: 

x l x 2 > " x 3 ^ x 4 » x 5 " x 6 » ~ x 7 ~ x 8 
-Xj + X2 j -X3 - X4 , Xj + Xfi , -Xy + Xg 

x l ~ x 2 x 3 x 4 * x 5 ~ x 6 x 7 ~ x 8 
x l + x 2 ~ x 3 + x 4 > ~ x 5 ~ x 6 +x 7 ~ x 8 
~Xj - %2 + X% + X4 - + Xfi -Xy - Xg 

First there is a split to four 4x2 [/-matrix where the horizontal dimension reflects the lowest 

spreading factor. Then the (/-method is applied, on an extremely basic level, to demonstrate 

how additions are saved by this new aspect. Thus, applying the elimination of equivalent 

lines, only two additions are required to compute the all first sums of each line: 
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x } + x 2 

' x l + x 2 
Xj - x 2 
Xj + x 2 
-xj - x 2 

Similarly only two additions are required for the second sum of each line: 
-x$ + X4 

' x 3 ~ x 4 
4- x 4 

-X3 + X4 

+ X4 

and so on. Hence, the total of 8 additions is required for this part. Four additional additions 
are necessary to complete the required sums of lines 3 & 4 and another four additional 
additions are required to compute the required sum of line 5. Hence total of 16 additions 
were required by an application of a preferred embodiment of the invention invention. Prior 
art, conventional brute-force methods would have required 28 additions for the same task. 
The environment of current aspect of the invention includes a [/-matrix that might be of large 
dimensions where at each line sub-sums of equal intervals are required as in the above 
example. The input vector is real or complex. The matrix may be sub-divided by lines to 
several sub-matrix wherein each of them is computed by the method used in the above 
example separately and independently. Into this embodiment a method is integrated that finds 
near-best subdivision in term of saving additions and hence reducing complexity. The tool 
for it is an additional processor or apparatus based on dynamic programming, which analyze 
the various subdivisions by the use of the table, bound and regressive formula of the 
complexity, s(n,r), of the [/-method. Very precise formulation is required for the 
development of this embodiment. Several new definitions will be needed as preliminary 
material. 

Let v=(vj,v 2 , ,v„) be a vector and p a positive integer that divides n (in short: p\n). Define 

vfpj to be the vector of vectors that is formed by subdividing v to sections of length p. Thus: 
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v/p7= ((vj,v 2 , ,Vp), (v p+h v p+2 v 2p ), ,(v n _ p + h v n _ p + 2 , i v r )). 

Denote the sections by: 

v/M7= ( v (k-i) p + 7' v 2 %) for 7 

The integer p in the context is called the spreading factor. Observe that multi-vector is a 
structure akin to the structure of a matrix. The next item, multi-scalar-product of multi- 
vectors, is a cross between matrix product and the usual scalar product. Take two n- 

dimensional vectors v^(vj,v 2 , and w=(wj,w 2 , i ^ n ) f then the ^-multi-scalar-product 

of v and w is defined by: 

vw[p]=( vfpjj'wfpj], v[p,2]*w[p,2], ,v[p f n/p]*w[p,n/p]) 

where the internal products: v[p,l]*w[p,l], v[p,2]*w[p,2], are usual scalar products. 

Note that the result of this product is a n/p-dimensional vector. 

Let A be an rxn matrix with lines Aj, ,A r and p=(pj,p 2 , ,p Y ) a positive integer vector. 

It would be said that p divides n if for every l<i<r pi divides n. This is denoted by: p\n. 

Assume now that p\n. Take an ^-dimensional vector x and define the jp-multi-product of A 

by x, denoted by A*x[p], to be the vector of vectors: 

A*x[p] = (Aj'xfpJ, A 2 *x[p2], ,A r vc[p r ]) 

The current embodiment of the invention will improve the computation of such products in a 
setup that will be described henceforth. 

A multi-product system. A multi-product system (or in short MP-system) is a setting that 
includes as parameters the integers nj and a positive integer vector p=(pj,p 2 , ,p r ) such that: 

Pi \P2 > Pl\P3>P3 W > Pr-1 | Pr and Pr\ n ■ 

The integers Pi,p 2 , ,p r are called the spreading factors of the system. The parameters of the 

MP-system will be written in the following way: 

P= (r,n )P ) 
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To these parameters attach now an rxn [/-matrix A and an w-dimensional real vector, x. The 
goal is to efficiently compute the product A*x[p]. The entire MP-system will be written in the 
following way: 

iS= (r,n,p,A,x) 

When all the integers p ]f p 2 , ,p r ,n are also powers of 2 then the system is called binary 

multi-product system, or in short, BMP-system. 

The M-l-method This feature of the invention is a straight-forward adaptation of the U- 
method to an MP-system. It is represented by the above example. In reality it will usually be 
applied to subsystems of an MP-system after the near-optimal subdivision that will later be 
described. 

Given an MP-system S= (r t n f p ,A, x). The method is based on a horizontal division of the 

matrix to sub-matrix with respect to the smallest spreading factor p u each sub-matrix is of 
width p 2 . It begins by computing the sum of the first p j real numbers of the vector x with the 
[/-coefficients of the matrix A. This is done simultaneously in all lines by the [/-binary 
method. It then goes to the next pj columns doing the same process. It continues in this 
fashion until all the n variables are exhausted. Next it goes to each of the lines where pf>p j 
and completes the summation process in a trivial way. 

The first step of the [/-method on each sub-matrix is to scan out lines that are a straight or a 
negative copy of other lines. Hence, for example, if pj=2, then at most two additions are 

needed at each section, regardless of r. In general no more than lines are considered by 
the [/-binary method in every section. Moreover one application intended by this text is when 
A is Hadamard. In this case there are no more then pj non-equivalent lines in each section. 
Thus another source is inserted that contains the bound on how many non-equivalent lines 
might appear in each sub-matrix. It will be contained in a function denoted by z (stored as a 
table) which is a consequence of the type of the matrix at hand. Its parameters are p } and r 
and it is written by z(p h r). For example when A is Hadamard then z(4, 6) =4, z(8,5) =5, when A 
is a general [/-matrix then z(4, 40)=8. It always holds that z(p h r) < min{2 vi ~ l , r} and in the 
case that A is Hadamard: z(p j,r) = min{ pj, r). 
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The computation of the complexity of the M-l -method will be based on the table and the 
regression formula of s(r) that appear at the description of the [/-binary method above. For 
every positive integer y put s'(y)=s(y)+y. Recall also that: s'(y) < 2 y_1 + 2 y/2 + 1 and this 
inequality is rather tight. This gives an intuitive view to the magnitude of the complexity in 
the next formula. The number of additions done by the M-l -method is thus bounded by the 
following term: 

C(n,r,p t z) = (n/pjXpj + s(z(p h r))) + (n/p^fo/pj - 1) + (n/pjfcj/pj - 1) + 

+ (n/p r )(p/Pi -1) 

= n-(l + (s(z(p h r)) + r - 1)/ Pl - (l/p 2 + l/p 3 + + 1/ Pr )) = 

= n*(l + s'(z(p b r)) / Pl - (1/ P} + l/p 2 + l/p 3 + + l/p r )) 

Note some trivial but not insignificant instances of this formula. The first is when the matrix 
has one line, that is, when r=l, then: 

C(n,r,p) = n - n/p } 

The second is the case of uniform spreading factor, that is when pj=P2= = p r then: 

C(n,r f p,z) = n*( 1 + s(z(p h r)) /p 7 ) 

Obviously, the M-l -method is by no mean efficient when r is large relative to pj. It is largely 
a stepping stone for a smarter method will be developed on its foundations. This stronger 
method works by horizontally subdividing the matrix and applying the M-l -method to each 
sub-matrix separately. To find with fewer calculations a subdivision structure bearing a 
smaller total number of additions it will be useful to have the following shorter version of the 
above formula. So define: 

C*(n,r,p,z)= l+s'(z(p h r))/ Pl 
Subdivision of multi-systems and the generalized notion of multi-vector. Next, a 
formulation is made to command that subdivides a matrix on horizontal lines. This command 
will be represented by a subdividing vector r. The result will be breaking of the original 
problem to a few sub-problems of the same width - «, each is solved by the above M-l- 
method. The subdivision will later be attached to an inventive algorithm to maximize 

efficiency. Take an r-dimensional integer vector, p=(p],P2> ,p r ), of spreading factors, and 

an rxn [/-matrix: 
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>4= 



a 2l a 22 



a 



In 



a* 



and take integers k , m where 1 <k<m<r. Define first a section of the vector p: 

p(k,m)=(p b , Pm ). 

It is simply taking the components with indexes k to m. Define a section of the matrix A: 



A(k f m) = 



Likewise, it means taking the lines with indexes k to m. 

Next, consider an integer vector: r=(r(l), ,r(t+l)) satisfying: 

k - r(l)<r(2)< <r(t)<r(t+l) = m+l 9 

which is an instrument to split to sections. First, r is a tool to create a subdivision of p to a 
vector of vectors p[r], in the following manner: 



P[r]=((p r (l)> »Pr(2)-l)>(Pr(2) >Pr(3)-l>) >(Pr(t)>P2> >Pr(t+I)-l)) 

The sub-vectors are denoted by, 

P[rJ] =(p r (l) >Pr(2)-l) 

P[r>2]=(Pr(2)> >Pr(3)-l) 



p[ r, t]=( Pr ( t ), >Pr(t+l)-l) 

The rows of the matrix A are similarly subdivided with accordance to the subdividing vector r 
in the following manner: 
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for all integers l<q<t: A[r,q]= (a tj : r(q) < i < r(q+l), 1 <j < n) 

The M-method for a given subdivision. This section is the main step in the formation of a 
mechanism that finds a low complexity sub-division, mechanism that is central for this 
embodiment. Suppose we are given a line-subdivision of the matrix and perform the M-l- 
method on each sub-matrix separately. The goal would be to estimate the total number of 
additions so that a subdivision of fewer additions will be found in the next stage. Fix an MP- 

system S=(r, n ,p=(p],p 2 , ,p r ), A,x,z) and a subdivision vector r=(r(l), ,r(t)) where 

l<k = r(l)<r(2)< <r(t)<r(t+l) = m+l<r+l. 

For 1 <q<t the S q MP-sub-system is given by: 

S q =( iq+l)-r(q), n ,p[r,q], A[r,q],x f z) 
The total number of additions of all sub-systems is: 

C(n,r,r,p,z) = □ 7 < q < t c(n , r(q+l) - r(q) , p[r,q],z) = 

= n-(U !< q < t s'(z(p r(g) , (r(q+l)-r(q))) /p r(q) - (lfp k + + 1/pJ + t) 

The next goal will be to develop an efficient method that finds the subdivision which 
minimizes this term. It will be more efficient to do the calculations without computing at 

each stage the repetitive additive sub-term: l/p k + + l/p m and the ^-factor. Thus we 

define: 

C*(n,r,r,p,z) = □ j < ^< t s'(z(p r(q)} (r(q+l) - r(q)))/p r(q) + t 

A near optimal subdivision. At this stage we should study the properties of the subdivision 
that minimizes the amount of additions given by the term C(n,r,r,p,z) above. First we give an 
abstract formula to the number of additions where the subdivision has minimal worst case 
bound on a given subinterval of the original problem. 

Consider then an MP-system S=( n ,p=(p],p 2 , ,Pj), A,x,z) and integers k, m where 1 < 

k<m<n. Define: 
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h(km) = min{C(n,m-k + l,r, p(k,m) t z) :for r=(r(l), ,r(t+l)) 

where k=r(l)<r(2)< <r(t)<r(t+l)=m+l) 

h*(k,m) = rnin{c*(n , m - k + l f r,p(k,tn),z) :forr=(r(l), ,r(t+l)) 

where k=r(l)<r(2)< <r(t)<r(t+l)=m+l} 

It holds that: 

h(k f m) = n*(h*(k,m) + (l/p k + + 1/pj) 

The following recursive formulas holds: 

h(k,m) = min{ C(n, rn-k+1, p(k,rn), z) , rnin{h(k,q-l) + h(q,m):for allk< q< m}} 
where min(empty set)=infinity. 

h*(km) = jnin{c*(n, m-k+1, p(k f m) ,z) min{h*(k,q-l) + h*(q,rn):for all k < q<m}} 
where min(empty set)=infinity. 

Now these expressions are used by the following dynamic code where the substructures are the 
intervals between k and m. To reduce the complexity of this code A* will replace h on the 
route to discover the best substructure. This has no effect on the final result. To find the first 
step in the optimal subdivision we will compute the q that minimizes the expression h(k,q- 
l)+h(q,k) for every k and m where 1 <k<m<n, which is the same q that minimizes the 
expression h *(k,q-l)+h *(q,k). Thus define: 

q(k,rn) =k when h*(k,m)= C*(n,r y p(k,m),z) and otherwise: 

q(km) = smallest q, where k<q<m and h*(k,m) = h*(k,q-l) + h*(q t k) 

Now the optimal subdivision code can be formed. 

A dynamic programming inventive code that finds the optimal subdivision. The 

following codes receives as data the parameters of an MP-system P= ( r,n ,p,z) and produces 

as output the subdivision r on which the M-method performs optimally. In addition it also 
returns the complexity of the optimal M-method. 

To run it efficiently we need to compute and store beforehand a table of s f (r). This is done by 
a fast code that calculates this table using the regression formula: 
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s'(r) = 2 r -l + S '(r(l)) + s'(r(2)) 

Optimal Subdivision Tables (n,r,p,z) 
for b going from 0 to r-1 do 

for k going from 1 to r-b do 
m:=k+b 

h*(k,m) < C*(n , m-k+1 , p(k,m),z ) 

q(k,m) < k 

for q going from k+1 to m do 
d<- — h*(k,q-l) + h*(q,m) 
if d< h*(k,m)then 

h*(k,m)< d and 

q(k,m) <-- — q 
return the tables h* and q. 

What is left now is to obtain from the table q(k,rn) constructed by this procedure the optimal 
subdividing vector. This is done by the following code that create the set R which contains the 
components of the optimal vector r. 
Optimal Subdivision Vector (n,r,p) 
Set: R-empty set and k:=l m:=r 

Find Vector (k,m): 

if q(k ? m) > k then 

put q:=q(k ? m) 

add q to the set R 

Find Vector (q,m) 

Find Vector (k,q-l) 
Return the set R. 

Now that the optimal subdivision r is found the M-method will run on this subdivision. 
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IMPLEMENTATIONS 
The following preferred embodiments of the invention provide inventive algorithms which 
are detailed implementations of the above methods. They enable additional reduction in 
memory and energy resources required to perform the above methods. They have the dual 
goal of enhancing the above methods and of providing the instructions required to build the 
apparatus. One underlining assumption regarding these implementations is that the 
transformation matrix is generic and may be considered as arbitrarily chosen from the set of 
matrix having a given set of entries. Another is that the number of rows is small enough 
when compared with the number of columns such that the above mentioned fundamental 
bound 

number of rows<log(number of columns) 
is satisfied. Hence some of the steps that are suitable in other circumstances such a check for 
equivalent lines are redundant here. 

The mappings described in the text that follows channelize the flow of data from one location 
to the next. Each location is assigned with a binary address corresponding to a column of the 
matrix at a given iteration. In the case of U-matrix a ^-7)»component of the column, 
corresponds to 1 in the address, and likewise a 7 -component of the column, corresponds to 0 
in the address. A similar correspondence is defined for C/y-matrix. 

The implementations that would be described include a first step where the incoming xj- 
signals are added to preset destinations determined by their corresponding columns, where 
each xj may be multiplied by a sign and/or power of 2. In the proceeding iterations, 
whenever a column is split with accordance to the invention, the assigned address of each of 
the two splits is lesser or equal to the address of the given column. This enables every 
information stored in some location of the memory to be sent to its destination before being 
processed and lost. In addition, if it holds that one of the split halves of a given column is 
zero, the address representing this column will be used for the next iteration. These 
properties are accomplished by a novel construction of the data-flow maps which is set to 
minimize the movements of data. At each iteration maximal number of addresses remain 
untouched, including all those whose present content can be used by the next iterations. 
The mechanism of the following implementations is better understood in view of the above 
described GEM preferred embodiment of the invention. A finite set of numbers, S, that 
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includes the number 1 is considered, and in this context a matrix of r lines is termed a 
complete S-r-matrix if it consist of all the possible nonzero configurations of normalized r- 
dimensional column-vectors whose components belong to the set S, where each configuration 
appear exactly once and the normalization convention is that the bottom-most nonzero 
element should be L 

Observe the following low dimensional examples. 
The matrix: 

"1 0 f 

0 1 1_ 

is a complete {OJj-2-matrix; 
the matrix: 

"1 -I 

1 1 _ 

is a complete C/-2-matrix; 
the matrix: 

"1 -1 1 -f 
11-1-1 
11 1 1 _ 

is a complete [/-5-matrix; 
the matrix: 

"10 10 10 1" 
0 110 0 11 
0 0 0 1 1 1 1 

is a complete {OJJ-3 -matrix; 
the matrix: 

"i -i j -f 
1111 

is a complete {l,-lj,-j}-2-matrix. 

The implementations that follow are based on the assumption that the number of rows is 
small enough with relation to the number of columns so that large proportion of all possible 
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configurations appear in the columns of the transformation matrix. Hence the first iteration 
made in all the proceeding implementations reduces the computation to the product of a 
complete S-r-matrix, with a modified vector. Consequently all the more advanced iterations 
compute the product of a lower dimensional complete S matrix with a respective modified 
vector. 

The ordered manner in which the columns appear in the matrix of the above examples reflect 
another feature of the address settings of these implementations. That is of an addressing 
based on a consistent rule of translating the columns to an IS] -base addresses where the MSB 
is the bottom component of the vector and the LSB is the top. 

The fundamental result is that after the first iteration there is a unique uniform complete 
transformation matrix, with a uniform (and rather obvious) address-numbering. This 
uniform matrix depends only on S and r and not on the initial transformation matrix. This 
leads to uniformity in the rest of the process and hence enables the creation of data-flow 
structures that are universal and independent of the particular transformation matrix after the 
first stage. This is very useful in hardware implementations and in the construction of 
apparatuses based on the present invention, which is the main goal of the current chapter. 
Additional and separate aspect of these implementations is a reduction of the required read 
and write memory allocation and the number of additions for suitable transformation matrix. 
Consequently, the energy required for the performance of the process and the cost of the 
apparatus are reduced. It should be observed that the efficiency of each of the following 
codes is enhanced when it processes the same initial transformation matrix with several input 
x- vectors. 

Finally, to have a proper perspective, the following implementations should be read with the 
understanding that they are a few examples out of many possible efficient implementations 
of the invention. 

Example 1: 0-1-Binary-Matrix 

The following sequence of steps describes an implementation of the 0-7-matrix aspect of the 
invention. The data consists of an rxn 0-7-matrix A=(ay : 04<r, 0j<n) and an input real or 

complex vector x=(xo, ,x n -i). The columns of the matrix A are denoted by 

V0, ,v B -/. This sequence of steps computes the product y-(yo, ,y r -i) T =A'X. At 

every given ^-iteration each location contains one real or complex number, depending on the 



-61- 



Attorney Docket No. 10559-346001 /P8300 



vector x. The allocated read and write memory contains 2 r -1 addresses labeled from 1 to 
2 r -l, representing the columns participating in the process at each iteration. The following 
definitions are part of the description of the present preferred embodiment of the invention. 
Definitions. 

1) Define for all k-,0, j-,h l(kj) = f(j - iy(r/2 k ) ] 

2) Define for all 0m<r Y m = 2 m . These addresses will contain at the end of the process the 

components of the output vector y=(yo, ,y r -i) T where Y 0 would be the address of y 0 , Y 2 

would be the address ofy^ ezt. 

3) For k-j 0 and j -,1 define as follows the functions F k)j , Gjy which control the 
movements of data from each location to the next. First put: / = l(kj), m= l(k+l,2j), h 
=l(kj+l)-l. 

Next for each integer v-D define: 

F kJ (v)= 2 m lv/2 m ] [Eg. 100] 

G kJ (v)= v mod 2 m [Eq. 101] 

4) Define For every vector v=(vo, ,v r -i) e{0, 1 f\ 

n(v)=£ Q < ]<r 2!vj. 

The code. 

1. initialization: Put zero in every address from 1 to 2 r -l 

2. first stage: for a j going from 0 to n-1 do 

if vj^O than add xj to the address v- □ (vj) 

3. main part: 

for k going from 0 to flog(r) 7-1 do 
for j going from 1 to 2 k do 

if I(kJ+l)-l > l(kj) then 
for p going from 1 to 2 l(kJ+I) ' m) -1 do 
putv=2 w; 77 

for (source) address v do 

if Fkj(v)^OandF k j(v)^v then 
add the value residing in the (source) address v to the value in the (destination) address 
G kJ (v) 
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and also add this value (of source address v) to the value in the (destination) address F kJ (v) 
4. obtaining the output: 

at this stage, every address Y t contains the value of the output component^-, for all 0-i<r. 
Complexity. 

An elementary step in the present terminology is reading a number from one location of the 
memory, called source, and adding it to a number placed in another location of the memory, 
called destination. 

An apparatus based on the above code would require the following number of elementary 
steps to compute the main part of the above computation of A-x: 

Cf J r © 2 r+I - 2r -2 
This number depends only on r. 

The entire computation of A-x thus requires at most the total of : 

Cf J n>r © n + Cf' ] r 

elementary steps. 

Chart 1: 0-1-Binary-Matrix 

1 schematically illustrates the operation of updating the contents of a memory employed by an 
apparatus which performs the main part of the above code, implementing thus in a preferred 
fashion the 0-i-binary aspect of the invention, with a 0-7-binary matrix comprising 4 rows. A 
column of each matrix at each iteration is represented by an address in the memory, in a 
binary way. The bottom (0 or 1) component (the extreme right hand component in a 
horizontal presentation) is the MSB and the top (0 or 1) component (the extreme left hand 
component in a horizontal presentation) is the LSB. The addresses Y m = 2 m , 0-m*3 f contain at 

the end of the process the components of the output vector y=(yo, ,ys) T where Y 0 would be 

the address ofy 0 , Yj would be the address of y h ect. The arrows signify the act of taking the 
content of one address and sending it to be added to the content of another address. This is 
done with accordance to the order of the iterations and the (increasing) order of the addresses 
participating in each iteration, as indicated by the above code. 

Example 2: £/-Matrix 

The following sequence of steps describes an implementation of the {/-matrix aspect of 
invention. The data consists of an rxn {/-matrix A=(aij : 0i<r, 0j<n) and an input real 
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vector x=(x 0 , ,x„.j). The columns of the matrix A would be denoted by 

w o> This sequence of steps computes the product y=(yo, f y r ~j) T =A-x. At 

every given iteration each location contains one real or complex number, depending on the 
vector x. The allocated read and write memory contains 2 r ' J + r-1 addresses labeled from 0 
to 2 ' + r - 2. The following definitions and those of the previous example are required for 
the description of the present preferred embodiment of the invention. 

Definitions. 

1) For each r-dimensional [/-vector, u=(u 0 , f u r .j) define: 

Sign(u)=u r -i 

h(u) = (Uo'Ur-j, ,Ur-l'U r -l)- 

2) Define a correspondence between the bi-polar binary set, U={1,-1} and the logic-bits 
binary set B={0,1}, by: 

(-i)' = i 

r = o. 

Accordingly for an r-dimensional [/-vector, u=(uo, ,u r -i), define: 

3) Define Y 0 = 0, and for all 1 -m-r-l: Y m = 2 r ' 1 +m-l. These addresses will contain at the 

end of the process the components of the output vector y=(yo, ,y r -if where Yo would be 

the address of yo, Y } would be the address of y h ezt. 

4) For all k-, 0, j -,1 the maps F hJ , G kJ , Sign^ are defined as follows. Let /= l(k,j), 
m=l(k+l,2j), h =l(kj+l)-l. For every integer v-J) define: 

Sign kJ (v) = 1 - 2-L(vmod2 m )/2 m - 1 J 

F kJ (v)= riv/2 m J 

vmodT if Sign hJ (v)=l 

G kJ (v)= 

2 m -2 l - (vmod2 m ) if Sign kJ (v)=-l 

The code; 

1. initialization: Put zero in every address from 0 to 2 r ~ 1 +r-2. 
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2. first stage: for j going from 0 to n-1 do 

add Sign(wj)-Xj to the address 7i(h(wj)) 

3. main part: 

for k going from 0 to flog(r) 7-1 do 
for j going from 1 to 2* do 

if l(kj+l)-l > l(kj)thea 

1) add the value residing in the (source) address F/^ to the value in the (destination) address 

Yl(k+I,2j) 

2) for p going from 1 to 2 l(kJ+}) ' mhl -1 do 
put u=2 l(kJ) -p and for source address u do 

if Gfy{u)=uihm 

add the value residing in the source-address u to the value in the destination address Yi(k+i,2j) 
else if Fkj(u)=u then 

add the value residing in the (source) address u to the value in the (destination) address Yj^j) 
else if G k j(u)=0 add the value residing in the (source) address u multiplied by (-1) to the 
value in the (destination) address Yi(kj) and also add this value (of address u) to the value in 
the address Fkj(u). 

else add the value residing in the source-address u multiplied by Signify) to the value in the 
destination-address Gkj(u) and also add this value (of address u) to the value in the address 
F kJ (u). 
obtaining the output: 

every address Y t contains now the value of y i9 for all 0*i<r. 
Complexity. 

i) An elementary step in the {/-setting with the above implementation is reading a number 
from one location of the memory, called source, multiplying it by a sign which is 1 or -1 and 
then adding the result to a number placed in another location of the memory, called 
destination. 

ii) The complexity is formulated with the following terms. For 0 • k - flog(r) 1-1 and for 
1 J -2 k let: 
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For k = Flog(r) 1 and for let: 
then: 

iii) An apparatus based on the above code would require the following number of 
elementary steps to fulfil the computation of A-x. 

(1) For the first stage n elementary steps are required. 

(2) For all 0 -k >flog(r)l -\ mdl-j-2 k 

2-Ufcj - Uk+ijj - Uk+ijj-i + 1 
elementary steps are done in the (kj) step. 

(3) Therefore for all 0 • k • flog(r) 7-1 the ^-iteration of the main part requires the 
following number of elementary steps : 

2 fc + 2-Uk-Uk+i 

(4) The main part of the above computation of Ax thus requires the following number of 
elementary steps: 

u Jlog(r)l 
C r ©2uo+Uj+U 2 + +U flog(r)l-l +2 ~ ' T ' 

This number depends only on r. 

(5) The entire computation of A-x thus requires the total of : 

C U n ,r© n + C U r 
elementary steps. 

Chart 2; £/-Matrix 

Fig. 2 schematically illustrates the operation of updating the contents of a memory employed 
by an apparatus which performs the main part of the above code, implementing thus in a 
preferred fashion, the [/-binary aspect of the invention, with a [/-binary matrix comprising 4 
rows. A column of each matrix at each iteration is represented by a binary address in the 
memory where f-^-component of the column, corresponds to 1 in the address, and a 1- 
component of the column, corresponds to 0 in the address. A binary interpretation is applied, 
where the bottom component (the extreme right hand component in a horizontal presentation) 

-66- 



Attorney Docket No. 10559-346001 / P8300 

is the MSB and the top component (the extreme left hand component in a horizontal 
presentation) is the LSB. The special addresses Yq =0, Y } =8, Y2=9, Y3 =10, contain at the 

end of the process the components of the output vector y=(yo, ,y$ T where Y 0 would be the 

address of yo, Yj would be the address of yj, ect. The arrows signify the act of taking the 
content of one address multiplying it by sign which is 1 or -1 and sending the result to be 
added to the content of another address. One spear arrow signifies that the sign is 1 and 
double spear arrow signify that the sign is -1. This is done with accordance to the order of the 
iterations and the (increasing) order of the addresses participating in each iteration. 

Example 3 : U L - Matrix 
The following sequence of steps describes an implementation of the GEM method in the case 
where the entries of the transformation matrix belong to the set Ui ©{1, -1, j, -j). This one 
of the subcases of the GEM which appear often in applications. The data consists of an rxn 

[/^-matrix A=(aij : Od<r, Oj<n) and an input complex vector x=(xo, ,x n -i)- The 

columns of the matrix are denoted by w 0 , ,w„_/. This sequence of steps computes the 

product y=(yo> >yr-if = A-x. At every given iteration each location contains one real or 

complex number, depending on the vector x. The allocated read and write memory contains 
4 r ~ 2 + r-1 addresses labeled from 0 to 4 r ' lj r r - 2. This example uses the definitions set in 
the previous examples, as well as the following definitions: 
Definitions. 

1) Define a correspondence (and inverse correspondence) between the set Uj and the set 
{0,1,2,3} given by: 

V =0, 0* = 1 

(-iy = i, ;* = -;, 

r =2, 2*=j, 

(-j)' = 3, 3* = -j 

Accordingly for a r-dimensional [//-vector, u=(u 0 , define: 

n(u)=£ 0 <j<r4u T j. 

2) For each r-dimensional [//-vector, u=(uq, ,u r -i) define: 
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Sign(u)=U r -i 

g(u) = (UoiUr-l)' 1 , f U r .r(u r -l)' J ). 

3) Define Y 0 = 0, and for all 1 m<r: Y m = 4" 1 + m - L These would be the addresses of 
the components of the output vector. 

4) For all k~i 0, j -nl the maps F^, Gkj are defined as follows. Let / = l(kj), m= 
l(k+l,2j), h -l(kj+l)-l. Take an integer v-O, represented in the 4-basis by: v=2?o$ m r-2 

4-Vj and define: 

Sign kJ (v) = (v m .j) * = (L(v mod 4 m )/4 m ~ 1 ]) * 
Fkj(v)=4 m iv/4 m J 

The code. 

1. initialization: Put zero in every address from 0 to 4 r_/ +r-2. 

2. first stage: for j going from 0 to n-1 add Sign(wj)'Xj to the address D(g(wj)) 

3. main part: 

for k going from 0 to flog(r) 7-1 do 
for j going from 1 to 2 k do 

if I(kj+1) - 1 > l(kj) then: 

1) add the value residing in the address Yj(kj) to the value in the address F/^+y^ 

2) for p going from 1 to 4 l(kJ+1) ' mhl -1 do 
put u=4 l(kJ) -p and for source address u do: 

if Gkj(u)=uthcn 

add the value residing in the (source) address u to the value in the (destination) address 

Y l(k+l,2j) 

else if Fkj(u)—u then 

add the value residing in the (source) address u to the value in the (destination) address Y^j) 
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else if Gkj(u)=0 add the value residing in the (source) address u multiplied by Sign kJ (u) to 
the value in the (destination) address 7/^/) and also add this value (of address u) to the value 
in the address F k j(u). 

else add the value residing in the (source) address u multiplied by Sign kJ (u) to the value in 
the (destination) address G k j(u) and also add this value (of address u) to the value in the 
address F k j(u). 
4. obtaining the output: 

every address Y t contains now the value of y if for all 04<r. 
Complexity. 

i) An elementary step in the Uj- setting with the above implementation means reading a 
complex number from one location of the memory, called source, multiplying it by a 
complex sign which is 1 or -1 or j or -j and adding the result to a complex number placed in 
another location of the memory, called destination. 

ii) The complexity is formed by the following terms. For 0 • k * Flog(r) 1-1 and for 1 -j 
■2 k let: 

w kJ =4 l(kJ+1) - l(kJ) - ] 

2 7 - y j(kj+i)-i(kj)-i 

For k = flog(r) 7 and for 1 J - 2 k let: 

ii) An apparatus based on the above code would require the following number of elementary 
steps to fulfil the computation of A-x. 

(1) For the first stage n elementary steps are required. 

(2) Forall 0 -k -Flog(r)l -\ and 1 J - 2 k the total of 

2'WkJ - W k+UJ - W k+]i 2j- } + 1 

elementary steps are done in the (kj) step. 

(3) Therefore for all 0 • k - flog(r) 7-1 the ^-iteration of the main part requires the 
following number of elementary steps: 

2 k + 2-w k - w k+I 
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(4) The main part of the above computation of A-x thus requires the following number of 
elementary steps: 

Uj flogfr) 1 

C r © + 2wo + w } + w 2 + + w fi og (r)7-i + 2 - 1 - r - 

A number that depends only on r. 

(5) The entire computation of A-x thus requires the total of : 

C Vl nfr © n + C U \ 

elementary steps. 

Example 4; Topiltz {/-matrix 

The following sequence of steps describes an implementation of the Toplitz matrix aspect of 

the invention with iZ-coefficients. The data consists of a {/-sequence t 0 , ,t„.j and an input 

real or complex vector x=(x 0 , ,x n+r .^ From the {/-sequence an rx(n+r-l) Toplitz- 

matrix A© (aij©ti-j : 04<r, Oj-n+r-2), is formed, where tk©0 for all k<0 or k-/n. These 

steps computes the product y=(yo, ,yn-i)-A-x. Only the first stage differs from that of 

the CZ-matrix example, therefore it is necessary to introduce only this stage. All the 
definitions listed in former examples of implementations are applicable here. 
The code. 

1. initialization: Put zero in every address from 0 to 2 r ~ ] + r-2 

2. first stage: 

1) for j going from 0 to r-2 add (l/2)-t n . r +j+rXj to the address 

7t(h( tj, tj-u th h, tn-h t n -2, , tn-r+j+l)) 

and also add -(l/2)'t n . r +j+rXj to the address 

TC(h(tj, tj-u tu to, -t n -i, -t n -2, , -tn-r+j+l)) 

2) for j going from r-1 to n-1 add tj-r+rxj to the address 
n(h(tj,tj. h 

3) for j going from n to n+r-2 add (l/2)-tj. r +]Xj to the address 

n(h( tj. n , tj- n -u t},to, t n .h t n -2, Jj-r 

and also add (l/2)-tj- r +iXj to the address 

7l(h(tj- n ,tj- n -i, ,ti,-to, -t n -l,-t n -2, ,'tj -r 

3. main part: The algorithm proceeds now as in the {/-matrix implementing algorithm's 
main part, and the output is stored in the same addresses. 
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Complexity. 

i) An elementary step in the above Toplitz implementation is the same as in the U- 
implementation, that is, reading a number from one location of the memory, called source, 
multiplying it by a sign which is 1 or -1 and then adding the result to a number placed in 
another location of the memory, called destination. 

ii) An apparatus based on the above code would require the following number of elementary 
steps to fulfil the computation of A-x. 

(1) For the first stage 

2-(r-l) + n - r +7 + 2-(r-l) = n + 3r -3 
elementary steps are required. 

(2) The main part of the Toplitz computation of A-x is done by an implementation of the main 
part of the [/-code with r lines. Thus it requires (f r elementary steps. 

(3) The entire Toplitz computation of A-x thus requires the total of : 

Cn t r © n + 3r -3 + <f r 
elementary steps. 

Chart 3: Toplitz-Matrix 

Fig. 3 schematically illustrates the operation of sending the incoming data to appropriate 
memory locations, employed by an apparatus which performs the initial part of the above 
code, with a Toplitz matrix comprising 4 rows. Except for this initial part every other aspect 
is identical to that of the [/-matrix implementation and apparatus, and the description there is 
applicable here. Here too one spear arrow signify that the sign is 7 and double spear arrow 
signify that the sign is -7. 

Example 5: Topiltz {/ / -matrix 

The following preferred embodiment of the invention is an implementation of the Toplitz 
matrix aspect of the invention with [//-coefficients. The data consists of a [//-sequence 

to, Jn-i and an input complex vector x=(x 0 , ,x n+r . 2 ). From the [//-sequence an 

rx(n+r-l) Toplitz-matrix A© (a^.j : 0d<r, Oj-n+r-2), is formed, where t k ©0 for all k<0 

or k-n. The sequence of steps listed below computes the product y=(y 0 , ,y n -i)=A'X. 

Only the first stage differs from that of the [//-matrix example, therefore only this stage 
needs to be described. All the definitions listed in previous examples are applicable here. 
The code : 
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1. initialization: Put zero in every address from 0 to 4 rA + r-2 

2. first stage: 

1) fory going from 0 to r-2 add (l/2)-t n _ r+J+r xj to the address 

□ (g(tj,tj-l, tj,t 0 , tn-l>tn-2, Jn-r+j+l)) 

and also add -(l/2)'t n . r+J+ fXj to the address 

0 (g0j> h'l> tlJo, -t n -l> ~t n -2, >~t n -r+j+ 1)) 

2) for j going from r-1 to n-1 add tj^+rxj to the address 

n(g(Wh , tj. r+1 )) 

3) fory going from n to n+r~2 add (l/2)-tj*+iXj to the address 

D (g(tj-n> tj-n-1, tlM tn-L t n -2, Jj-r 

and also add (l/2)-tj^+iXj to the address 

0(g(tj- n ,tj- n -l, ,t],~to, -tn-h-U-2, ,-tj -r 

and also add -(l/2)-tjXj to the address 

D (g(tj-n,tj-n-l, M-tfh -tnrtn-L ,-tj -r +2)) 

3. main part: proceed as in the f//-matrix algorithm's main part and the data is received in 
the same way. 

Complexity. 

i) An elementary step in the above Toplitz implementation is the same as in the Uj- 
implementation, that is, reading a complex number from one location of the memory, called 
source, multiplying it by a complex sign which is 1 or -1 or j or -y and then adding the result 
to a complex number placed in another location of the memory, called destination. 

ii) An apparatus based on the above code would require the following number of elementary 
steps to fulfil the computation ofA-x. 

(1) For the first stage 

2-(r-l) + n - r +7 + 2-(r-l) = n + 3r -3 
elementary steps are required. 

(2) The main part of the Toplitz computation of A-x is done by an implementation of the main 
part of the £//-code with r lines. Thus it requires C U * elementary steps. 

(3) The entire Toplitz computation of A-x thus requires the total of : 

C T} n , r © n+3r -3 + 
elementary steps. 
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Example 6: Real Matrix, binary 0-1 representation. 

The following preferred embodiment of the invention is an implementation of the real matrix 
aspect of the invention with binary 0-7-representation of the matrix-entries. The data 
consists of an rxn real-matrix A=(aij : 04<r, 0j<n) with nonnegative entries and an input 

real or complex vector x=(x 0 , s x nA ). The algorithm computes y=(y 0 , ,y n -i)=A-x. 

It is assumed that the entries of the matrix are given by the following 0-7-binary 
representation, for all 04<r, 0j<n: 

aij= ^. m2 <k< mi kk % 2 k 
where: e{0,l} for all -m 2 -hmi 

Only the first stage differs from that of the 0-1 -matrix implementation, therefore only this 
stage needs to be described. All the definitions listed in the 0-1 -matrix implementation 
section are applicable here. In addition define the following r-dimensional {0,1 j-vectors 
for all 0j<n, -m2-hmi\ 

Vjk = (Ujkyhjh Jrjt) 

1. initialization: Put zero in every address from 0 to 2 r -l 

2. first stage: 

for a counter j going from 0 to n-1 do 

for A: going from -m2 to mi do 

add to the value in address □ (vjz) 

3. main part: It is proceed as in the {0J}-matrix implementation algorithm's main part for r 
rows and the output is stored in the same addresses at the end of the computational process. 
Complexity. 

i) An elementary step in the above implementation is the same as in the (W-binary 
implementation, that is, reading a number from one location of the memory, called source, 
and adding it to a number placed in another location of the memory, called destination.. 

ii) An apparatus based on the above code would require the following number of elementary 
steps to fulfil the computation of Ax. 

(1) For the first stage 
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(mi + m 2 + l)-n 
elementary steps are required. 

(2) The main part of the of the code is done by the main part of the above implementation 0- 
1-binary-code with r lines. Thus it requires C° J r elementary steps. 

(3) The entire computation of A-x thus requires the total of : 

C © (mi +m 2 + l)-n + C° J r 
elementary steps. 

Example 7: Real Matrix, binary {/-representation. 

The following preferred embodiment of the invention is an implementation of the real matrix 
aspect of the invention with binary {/-representation of the matrix-entries. The data consists 
of an rxn real-matrix A=(a i} : 04<r, 0j<n) and an input real or complex vector 

x=(xo, ,x n .]). The algorithm computes y=(yo, ,y n -i)=A-x. 

It is assumed that the entries of the matrix are given by the following [/-representation, for 
all 0i<r, 0j<n: 

a 9 = □ ^ < * < _j t ijk *2k + t ijm: (2 m l - T m ^) 
where: t iJk eU for all -m 2 -l -kmj 

The existence of this representation is already mentioned in the real matrix aspect of the 
invention. Only the first stage differs from that of the [/-matrix implementation, therefore 
only this stage needs to be described. All the definitions listed in the [/-matrix 
implementation section are applicable here. In addition define the following r-dimensional 
[/-vectors for all 0j<n, -m2-lkmi\ 

Ujk ~ ( Ujk > hjh , tfj$ 

1. initialization: Put zero in every address from 0 to 2 r + r -1 

2. first stage: 

for j going from 0 to n-1 do 

for A: going from -m 2 -l to mj-1 do 

add 2 k -Sign(u jh )-Xj to the address n(h(uj0) 
for k=fnj do 
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add (2 m l - 2- m2 - 1 )-Sign(u Jk )-xj to the address 7c(h(u^) 

3. main part: It is proceed as in the w-matrix implementation algorithm's main part for r 
rows and the output is stored in the same addresses at the end of the computational process. 
Complexity. 

i) An elementary step in the above implementation is the same as in the above U- 
implementation, that is, reading a number from one location of the memory, called source, 
multiplying it by a sign which is 1 or -1 and then adding the result to a number placed in 
another location of the memory, called destination. 

ii) An apparatus based on the above code would require the following number of elementary 
steps to fulfil the computation of A-x, 

(1) For the first stage 

(mj + ni2 + 2)-n 
elementary steps are required. 

(2) The main part of the code is done by the main part of the above implementation of the U- 

u 

code with r lines. Thus it requires C r elementary steps. 

(3) The entire computation of A-x thus requires the total of : 

Re-U u 

C © (mi + m 2 + l)-n + C r 
elementary steps. 

Fig. 4 is a block diagram of an exemplary apparatus for performing a linear transformation 
with a reduced number of additions, according to a preferred embodiment of the invention. 
The apparatus 500 consists of a multiplier 10 with two inputs and one output, a multiplexer 
(MUX) 9 for selecting one of its inputs to be transferred to its output, an adder 1 1 with two 
inputs and one output, followed by a Dual Port Random Access Memory (DPRAM) 13, 
having two address bus lines, "add_a" and "add_b" and two outputs, "data_a" and "data_b" 
The MUX activity is controlled by an address generator 501, which also enables the access to 
memory addresses in the DPRAM 13. The address generator activity is controlled by a 
counter 3. 
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The output of the multiplier 10 is connected to one input, "C" of the MUX 9. The 
output of the MUX 9 is connected to one input "A" of the adder 11. The output of the adder 
11 is connected to the input of the DPRAM 13. One output, "data^a", of the DPRAM 13 is 
connected to the input "B" of the adder 11. The other output, "data_b" of the DPRAM 13 is 
connected to the input "E" of a multiplier 12. The other input "F" of the multiplier 12 is 
connected to the output "sign" of the address generator 501. The output of the multiplier 12 
is connected to the other input "D" of the MUX 9. The counter 3 is connected to two inputs 
of the address generator 501. The output "H" of the generator 501 is connected to the "sign" 
input of the multiplier 12. The output "G" of the generator 501 is connected to the control 
input "S" of the MUX 9. The output "J" of the generator 501 is connected to the first address 
input "add_a" of the DPRAM 13. The other output "I" of the generator 501 is connected to 
the second address input "add_b" of the DPRAM 13. The transformation matrix may be 
stored in an optional RAM/ROM 1, which feeds the address generator 501 with the series of 
codes (a row of bits in the matrix), D 0 , D h ..., D s , and the multiplier 10, with the most 
significant bit D 0 . The input vector may be stored in another optional RAM/ROM 2, which 
feeds the multiplier 10 with samples of the input signal. Alternatively, the storage memories 
1 and 2 can be eliminated if elements of the input vector and their corresponding elements in 
the transformation matrix are provides in synchronization into the apparatus 500. For 
example, these elements may be provided by an ADC or a sequence generator. All the 
components of apparatus 500, are controlled by a common clock via "clock_in" inputs. 

Using the same input set ,[x\ Xi Xwjthe same U matrix, (that 

comprises n rows) representing the same different codes, Do, Dj 9 ... 9 D s , the operation of the 
apparatus is explained below. The operation of the apparatus 500 may be divided into two 
stages: the first stage, indicated as "Stage 1", during which the input data (i.e., the samples 

[li X2 Xn] (is received and the products of each component and its 

corresponding element of the transformation matrix is calculated and stored in the DPRAM 
13; and the second stage, indicated as" Stage 2", during which the received data is processed, 
while blocking further input data from entering the adder 9. The counter 3 counts the 
accumulated number of operations and the count (the output of the counter) is used to 
distinguish between the two stages. The number of operations of "Stage 1" is the length n of 
the input vector. Operations beyond this number are associated with "Stage 2". 
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According to a preferred embodiment of the invention, the address generator 501 
comprises a comparator 4, which is linked to the output of the counter 3. The comparator 
reads the current 

output of the counter 3, compares it with the length n of the input vector, and provides 
a corresponding signal, indicating whether the current operation is carried out in "Stage 1" or 
in "Stage 2". This signal is used to control the input "S" of the MUX 9, so as to switch 
between the inputs ""C" and "D". 

The address generator 501 also comprises an asynchronous memory 5 (e.g., a ROM), 
which stores preprogrammed values, and is used as a Look-Up-Table (LUT), for determining 
addresses in the DPRAM 13 via the input addresses "add_a" and "add_b", for processing 
their contents. The Size of the LUT is (C-w)x(l+2r), and its contents comprises three fields, a 
"Source" filed, a "Sign" field and a "Destination" field. For each operation (i.e., each clock 
cycle, counted by the counter 3), there are three corresponding source sign and destination 
values. The source field comprises information related to each column of the transformation 
matrix that has been split (divided), as well as an indication related to normalizing of the two 
parts of that specific column. The source field determines the value of the input "add_b" on 
the DPRAM 13. The sign field represents the lower component in each split column or sub- 
column. The destination field determines the value of the input "add_a" on the DPRAM 13, 
according to which the content of the corresponding address is selected for processing. 

The address generator also comprises a set of s (s = r-1) inverters, 6i,62...,6 5 , each of 
which comprises an input connected to the series of bits Dj 9 ... 9 D S9 respectively. The output 
of each inverter is connected to one input of a corresponding MUX from a set of s 
multiplexers, 7i,72...,7 5 . The series of bits £>;,..., D s are also fed into the other input of a 
corresponding MUX from a set of s multiplexers, 1\ 9 1%... 9 1 S » The output of each MUX from 
the set 7i,72- . .,7 S is controlled by the value of the most significant bit Do, so as to enable the 
transfer of the set D] 9 ... 9 D S unchanged, or inverted (i.e., D . D y s ) to the input "address_a" 
of the DPRAM 13. The set £>;,..., D s (or £>';,..., D' s ) is input into one input of a 
corresponding set, 8i,82-..»8 s of s multiplexers, with an additional multiplexer $ r which is fed 
by the MSB (the r-th bit) of "add_a" arriving from the LUT. The output of the comparator 4 
enables the selection of "add_a" to arrive from the set 7i,72. . . 9 7 s or from the LUT. The input 
"add_a" (i.e., the destination) controls the first output "data_a" of the DPRAM 13, which is 
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fed into input "B" of the adder 11. The input "add_b" taken from the LUT (i.e., the source) 
controls the second output "data_J>" of the DPRAM 13, which is fed into input "E" of the 
multiplier 12, which multiplies each value of "data_b" by a corresponding "sign" value, 
extracted from the LUT and feeds the product into the input "D" of the multiplier 12. The 
"write" operation in the DPRAM 13 is synchronous, i.e., the contents of each cell is 
overwritten according to the clock rate (e.g., when the clock signal rises). On the other hand, 
the "read" operation in the DPRAM 13 is asynchronous, i.e., each output is changed at the 
moment when the address input is changed, regardless the clock. 
Operation at Stage 1: 

At this stage, the counter 3 starts to count the first symbol time (n clock cycles), 
during which stage 1 is performed. During stage 1, the MUX 9 enables data flow from its 
input "C" to flow to its output and into input "A" of the adder 1 1, while blocking input "D". 

The input symbol [xi Xi Xn] is provided into one input of the multiplier 

10. The MSB D 0 of the code, is provided into the other input of the multiplier 10. At this 
stage, the destination (output" data__a") that is determined by the input "add_a", is extracted 

from DPRAM 13 and added to the components [Xi Xi Xn\ of the input 

vector, multiplied by the MSB Dq. The counter 3, which has been counted the current symbol 
time, provides an indication to the address generator comparator 4 and to the ROM 5 that the 
current symbol time has been terminated, and the comparator 4 switches the input selection 
of the MUX 9 from input "C" to the other input "D". Similarly, the comparator 4 drives 
multiplexers 81,82. . .,8 S to select data from the LUT, rather from the RAM 1. 
Operation at Stage 2: 

At this stage, the counter 3 starts to count the next symbol time, during which stage 2 

is performed. During stage 2, the MUX 9 enables data flow from its input "D" to flow to its 

output and into input "A" of the adder 11, while blocking input "C". At this stage, at each 

clock cycle, a selected address in the DPRAM 13 is accessed via "add_b" which represents 

the source. The source data is fed from the second output "data_b" of the DPRAM 13 into 

the input "E" of the multiplier 12, in which it is multiplied by the corresponding "sign" value, 

extracted from the LUT. This product is fed into the input "D" of the MUX 9, thereby 

appearing in the input "A" of the adder 11. At the same time, the content of the current 

destination value appears in the input "B" of the adder 11. The two values are added by the 
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adder 1 1, and the result is stored at the same address in the DPRAM 13 (which corresponds 
to the previous destination vaue). At the end of this summation process, the corresponding r 
transformation points OVs) are stored in the DPRAM 13 at addresses #0 and #2 r ' J to (2 r_i +r- 
2). This process is continued consequently, until the clock provides an indication that stage 2 
is terminated (according to a predetermined count) after all the transformation points (y f 's) 
have been calculated and stored in different addresses in the DPRAM 13. At that moment, 
the current symbol time has also been terminated, and the comparator 4 switches the input 
selection of the MUX 9 back from input "D" to the other input "C", and drives multiplexers 
81,82. . .,8j to select data from RAM 1, rather from the LUT, thereby driving the apparatus 500 
back to operate again at stage 1. 

One embodiment of the apparatus used to implement the above method is seen in 
Figure 5. Figure 5 illustrates an implementation of a product of an rxn U-matrix A by an n- 
dimensional vector X. The representation of the matrix contain 0 or 1 where 0 corresponds 
to 1, and 1 corresponds to -1. 

In this example the vector is real and v bits are dedicated to each component. In the 
subsequent construction the matrix A is stored in element 1 (RAM or ROM) and the vector 
X is stored in element 2 (RAM or ROM). The matrix and the vector may be generated from 
any internal or external device such as memory device or synchronized source, like ADC or 
some sequences generators, etc. 

A clock that control modules 1,2,3,13 synchronizes the whole system. Module 3 is a 
counter that counts from 0 to C, where C is defined as follows: 

C = n + 2u Q +ui+ u 2 + +u fh g (r) 1- 1 ' K 

l(kj)=f0-l)-(r/2 k )7 

h(kj)=fr(r/?)7-l 

In stage 1 the incoming signals from elements 1 &2 are inserted into element 13 with 
accordance to addresses that are coming from element 3, where each address contains the 
log2n list significant bits. 

In addition, the counter is used as an address generator of an asynchronous ROM 
(element 5 ) of the size of (C-n)x(l + r + r). All the bits of the counter go into element 4 
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which is a comperator that check if the current count is greater then n. Element 5 
encompasses three fields: Sign, Source, Destination. The order of the line at the ROM 
should be such that the first operation of stage 2 is induced by the counter after n cycles. 

For syntax purpose we will define s = r-1. The data buses go from element 1 bits- 
Dl-Ds into elements 6_l-6_s (which are inverters) respectively. Elements 71 -7s select 
between Dl-Ds and the output of 61 -6s according to the state of DO. 

(D0=0 => 7_l-7_s = Dl-Ds, D0=1 => 71-7s = 61-6s). 

Elements 8_l-8_s selects between 7_l-7_s and the lon 2 (n) LSB ! s outputs of element 
5 (add_a = Destination) according to the state of the output of the comperator in element 4 
(stagel_2n). Stage l m 2n=0=> 8_l-8_s - add_a, Stagel_2n=l => 8_l-8_s = 7_l-7_s. 

Element 8_r select between "0" to the r bit (add_a MSB) of element 5. Stagel_2n =0 
=> 8 j- = add_a MSB, Stagel_2n =1 => 8_r = 0. The output from 8_l-8_r used as the first 
address bus of element 13 , which is a dual port RAM at size of (2 M +r-l)x(V+log 2 n). This 
address defines data_a bus which is an output from element 13 and also the destination of the 
data_in bus which is an input to element 13. 

Add_b (which is output of element 5 (the Source field)) is the second address bus of 
element 13 (This address controls data_b bus which is an output from element 13). 

Sign is a single bit output from element 13 which multiply data_b at element 12 
(which is a multiplier). 

The data_in bus which is an input with respect to element 13 arrived from element 
11, which is an adder that sums together data_a and the output of element 9. The datajn 
goes to add_a destination in a synchronous way, while the read .operation of data_a and 
data_b is asynchronous. 

Prior to the action of element 13 it is initiated by insertion of zeros. Element 9 
selects between the outputs of element 10, and element 12. Element 10 multiplies the data 
bus from element 2 with the LSB of element 1. After C cycles the resulting vector is stored 
at element 13 at address 0 and addresses 2 r_1 to (2 r_1 +r-2). 

Numerous variations and modifications of the invention will become readily apparent to 
those skilled in the art. Accordingly, the invention may be embodied in other specific forms 
without departing from its spirit or essential characteristics. The detailed embodiment is to be 
considered in all respects only as illustrative and not restrictive and the scope of the invention is, 
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therefore, indicated by the appended claims rather than by the foregoing description. All 
changes which come within the meaning and range of equivalency of the claims are to be 
embraced within their scope. 
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WHAT IS CLAIMED IS: 

1 . A method for enhancing the efficiency of the performance of a linear 
transformation represented by an rxn matrix of at least one n-dimensional input vector above 
the real or complex or a finite field comprising: 

storing in memory a ratio between each omitted row and each selected row; 
omitting zero columns of said matrix and the corresponding scalar 
components of the input vector; 

normalizing each column of said matrix; 

generating a modified vector from groups of equal columns in the normalized 

matrix; 

generating a modified matrix; and 
obtaining the output vector. 

2. The method of Claim 1 , further comprising splitting the transformation 
matrix into several sub-matrices and obtaining the output vector by unifying the output 
vectors resulting from the products of each sub-matrix. 

3 . The method of Claim 2, wherein the modified matrix encompasses a 
subset of rows of said transformation matrix. 

4. The method of Claim 3, further comprising splitting the input vector 
into several sub-vectors such that each sub-vector corresponds to a sub-matrix and wherein 
the output vector is obtained by adding the output vectors resulting from the products of each 
sub-matrix. 
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5 . The method of Claim 1 , further comprising splitting a modified matrix 
into several sub-matrix, wherein an output vector is obtained by adding the output vectors 
resulting from the products of each sub-matrix, by the respective sub-vector. 

6. The method of Claim 1, further comprising normalizing each column 
of said matrix by multiplying the column by the inverse of a lead element. 

7. The method of Claim 1 , wherein the output vector is a product of the 
matrix and the input vector. 

8. The method of Claim 1, further comprising identifying groups of equal 
columns in the normalized matrix and attaching a unique location to each identified group. 

9. An apparatus for performing a linear transformation, comprising: 
first and second inputs which receive input data and predetermined data; 
transformation circuitry which acts on the input data and predetermined data; 
control and address generation circuitry, connected to a first memory, which 

generates corresponding addresses for accessing cells of said memory, and for controlling the 
selection between a data receiving mode, in which data is received via said first input, and a 
data processing mode, in which the arrival of incoming data via said first input is blocked; 
and 

counter circuitry for controlling the timing of the operations the apparatus. 

1 0. The apparatus of claim 9, wherein the transformation circuitry 
multiplies each element of the input data by a corresponding element of the transformation 
data. 
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1 11. The apparatus of claim 1 0, wherein the transformation circuitry 

2 comprises a memory which stores the result of the multiplication. 

1 12. The apparatus of claim 9, wherein the transformation circuitry 

2 comprises summation and accumulation circuitry. 

1 13. The apparatus of claim 9, further comprising a multiplexer circuitry 

2 which selects between the data receiving mode and the data processing mode. 

1 14. The apparatus of Claim 9, wherein the control and address generation 

2 circuitry comprises : 

3 a second memory which stores pre-programmed processing and control data 
W and; 

FA 

H a comparator circuitry which switches between the data receiving mode and 

% the data processing mode. 

g i 15. The apparatus of Claim 14, wherein the control and address generation 

□ 2 circuitry further comprises: 

G 

P 3 a first set of multiplexers, each of which having at least one direct input for 

O 4 receiving transformation data, and another input, into which said transformation data is fed 

5 via a corresponding inverter, said first set being controlled to transfer transformation data or, 

6 inverted transformation data, by a predetermined value provided by said transformation data; 

7 a second set of multiplexers, each of which having at least one input 

8 connected to the output of a corresponding multiplexer selected from said first set of 

9 multiplexers, and another input, connected to said second memory, said second set being 
10 controlled by said comparator circuitry to provide a first address to the first memory by 

l i transferring the output of each multiplexer from said first set to the output of its 
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corresponding multiplexer from said second set or, to provide at least a portion of the second 
address to the first memory by transferring data stored in said second memory; and 

a multiplexer, operating in combination with said second set of multiplexers in 
said data processing mode, having an unconnected input and an input connected to said 
second memory and controlled by said comparator circuitry, thereby providing the remaining 
portion of said second address. 
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ABSTRACT 

An improved method and apparatus for performing linear transformations, with 
reduced number of arithmetic operations, simplified circuitry, and low power consumption. 
The method performs complex U-transformation which appear in CDMA applications. The 
apparatus allows for simultaneous analysis of an incoming signal in different spreading 
factors. 

10057088.doc 
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v 5036/98 

Address #1= [1, 0, 0,0] 
Address #2= [0, 1, 0, 0] 
Address #3= [1,1, 0, 0] 
Address #4= [0, 0, 1, 0]= 
Address #5= [1, 0, 1, 0] 

Address #6= [0, 1, 1, 0] 

w 

Afldress #7= [1, 1, 1, 0] 
Address #8=[0, 0, 0, 1] 

\ 

Address #9=[1, 0,0, 1 

ASiress #10=[0, 1,0,1 

h 

?s=ss- 
IsJ 

Address #11= [1,1, 0, 1] 
Aadress#12= [0, 0, I, 1] 
Address #13= [1, 0, 1, 1] 
Address #14= [0, 1, 1, 1 
Address #15= [1, 1, 1, 1] 

Fig. l 
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100 



Address #0= [ 1, 1, 1, 1] = Yo 



Yo 



'Yo 



Address #1= [-1, 1, 1, 1 
Address #2= [1, -1, 1, 1 
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Address #10=[1, -1, 1,-1] = Y 3 
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Example 3: Ui- Matrix 
The following sequence of steps describes an implementation of the GEM method 
in the case where the entries of the transformation matrix belong to the set Ui 
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